Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bio_informatics
    Senior Member
    • Nov 2013
    • 182

    find files which contain a string

    Hi members,

    Pardon me if this looks a trivial doubt.
    I know this has been asked several times on stackoverflow, unix and many other forums.

    I'm having a hard time, finding file using 'find'. I have tens of folders and each of them have another tens of folders, and so on.
    The name of folders and the file I am interested to find are way to long.

    Isolate_96_CN_11_B21_M1_C3_P2_GGATTAGG_L001_R2_001.fastq.gz

    Example:

    I want to find file which has "96_CN_11_B21_M1_C3_P2".

    Code:
    find . -name "*.fastq.*" | xargs grep "96_CN"
    Code:
     find . -name "*.fastq.gz" -exec egrep -Hn "96_CN_" {} \;
    Above commands take ages. I have waited more than 90 mins for the output and it still was processing.

    Please guide here.
    Again, sorry for this small query.
    Last edited by bio_informatics; 03-04-2015, 05:27 AM.
    Bioinformaticscally calm
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    I assume you are looking to locate file names/paths?

    If you are going to be needing to do this frequently perhaps creating a database of file/folder names like so

    Code:
    $ updatedb --require-visibility 0 -U /folder_hierarchy_to_search -o mydb
    Then you can use "locate" command to find file names/paths very rapidly

    Code:
    $ locate -d mydb filename_to_search
    Creating the database could take a significant amount of time but would be worth the returns.

    Comment

    • bio_informatics
      Senior Member
      • Nov 2013
      • 182

      #3
      Hi Genomax,
      Thank you for your reply.
      I would be working on files for few months, then new data - new files names.
      Creating database is a nice idea but not worth when new data floods in on regular basis.

      I will post it on some other forum. (Will mention this post as well)
      Bioinformaticscally calm

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        This is not a regular database and may be the fastest way of finding files (http://linux-sxs.org/utilities/updatedb.html) and http://en.wikipedia.org/wiki/Locate_%28Unix%29.

        You could look into running this as a cron job each day so you would not need to worry about it.

        Comment

        • bio_informatics
          Senior Member
          • Nov 2013
          • 182

          #5
          Thank you for your valuable suggestions. I didn't know if anything of this type existed.
          I'm on a cluster and do not have much rights.
          I will definitely take these suggestions into consideration.
          Bioinformaticscally calm

          Comment

          • bio_informatics
            Senior Member
            • Nov 2013
            • 182

            #6
            I'm doing wrong here:
            find . -name "*.fastq.*" | xargs grep "96_CN"
            It finds .fastq files and in them, it tries to look/grep "96_CN".
            fastq.gz are binary files and it would definitely take ages to grep.

            Didn't try this on small directories first.

            --
            The correct one I got as:

            Code:
            time find . -name "*96_CN_11_B21_M1_C3_P2*"
            Got path and file names needed with time.
            real 0m0.434s
            user 0m0.037s
            sys 0m0.152s
            Last edited by bio_informatics; 03-04-2015, 05:44 AM.
            Bioinformaticscally calm

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              Can you try this?

              Code:
              $ find . -type f -name "*.fastq.*" | grep "96_CN"

              Comment

              • bio_informatics
                Senior Member
                • Nov 2013
                • 182

                #8
                Genomax:
                I was making a horrible error in my command.
                Morning with fresh mind picked it up instantly.
                Bioinformaticscally calm

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  Consider adding "-type f" to your find command since you are only looking for files.

                  Comment

                  • bio_informatics
                    Senior Member
                    • Nov 2013
                    • 182

                    #10
                    That worked.

                    Thanks much for your help and time.
                    Bioinformaticscally calm

                    Comment

                    Latest Articles

                    Collapse

                    • GATTACAT
                      Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by GATTACAT
                      Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                      07-01-2026, 11:43 AM
                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 07-02-2026, 11:08 AM
                    0 responses
                    14 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-30-2026, 05:37 AM
                    0 responses
                    15 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-26-2026, 11:10 AM
                    0 responses
                    20 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    54 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...