Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • number of read in RNA-seq?

    another biologist here.

    I got RNA sequence (in .txt format) back from sequencing facility. Each file of 2 lanes high throughput sequencing is nearly 25GB.

    I want to know the number of read in each file. Which programmer or code I should use.

    thanks you.

  • #2
    Try the LINUX command "wc -l <filename>". This will count the number of lines in the file. If the file is in fastq format, you will have to divide the number of lines by 4. This will give you the number of reads.

    Comment


    • #3
      why we use -l in the command?
      Last edited by ashiq.hussain; 07-12-2011, 12:17 PM.

      Comment


      • #4
        its not -1 it is -l and the command wc -l counts the number of lines in the file. The easiest way to find out the number of reads in a fastq file with one command is grep "@" -c <filename>. Hope this helps

        Comment


        • #5
          Originally posted by upendra_35 View Post
          its not -1 it is -l and the command wc -l counts the number of lines in the file. The easiest way to find out the number of reads in a fastq file with one command is grep "@" -c <filename>. Hope this helps
          This will not work! The '@' character may appear in fastq quality lines as well as the seq-id line. See this thread for a discussion on the problems of using grep to count reads in fastq files.

          Comment


          • #6
            thank you all.

            Comment


            • #7
              Sorry I missed "^" with @. It should be grep "^@" -c <filename>. However I agree with kmcarr that '@' is a character that may appear in fastq quality lines as well (sometimes at the start of the line) and so even that may not work. So the best way to do it is in two lines: wc -l <filename> followed by expr <no.of.lines> / 4

              Comment


              • #8
                compressed files

                Does it work with .gz files, or do I have to unpack them?

                Comment


                • #9
                  no use the power of piping

                  gzip -d -c input.fastq.gz | wc -l

                  Comment


                  • #10
                    I don't mean to revive a closed thread, just in case someone needs to know the number of lines in multiple compressed fastq files...

                    for file in *.bz2; do echo $file; b=$(bzcat $file | wc -l); echo $(($b/4)); done;

                    Comment


                    • #11
                      Great. Thanks. Very useful for me.

                      Comment


                      • #12
                        I didn't intend to revive this thread, but as I was searching for a solution to automatize the fastq read counting process, this solution may come in handy to some :

                        for i in `find . -name "*.fastq"`; do echo "$i" >> project_nbread.txt; egrep -c "`head -n 1 $i | awk -F '[@:]' '{ print $2 } '`" $i >> project_nbread.txt ; done
                        In the case of only one file, you can use this :
                        egrep -c "`head -n 1 file.fastq | awk -F '[@:]' '{ print $2 } '`" file.fastq
                        This solution will count the number of lines where the id is found in the header of a fastq seq, i.e the number of fastq reads.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Understanding Genetic Influence on Infectious Disease
                          by seqadmin




                          During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                          Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                          09-09-2024, 10:59 AM
                        • seqadmin
                          Addressing Off-Target Effects in CRISPR Technologies
                          by seqadmin






                          The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                          08-27-2024, 04:44 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Today, 06:25 AM
                        0 responses
                        13 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 01:02 PM
                        0 responses
                        12 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 09-18-2024, 06:39 AM
                        0 responses
                        14 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 09-11-2024, 02:44 PM
                        0 responses
                        14 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X