Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bio_informatics
    Senior Member
    • Nov 2013
    • 182

    find files which contain a string

    Hi members,

    Pardon me if this looks a trivial doubt.
    I know this has been asked several times on stackoverflow, unix and many other forums.

    I'm having a hard time, finding file using 'find'. I have tens of folders and each of them have another tens of folders, and so on.
    The name of folders and the file I am interested to find are way to long.

    Isolate_96_CN_11_B21_M1_C3_P2_GGATTAGG_L001_R2_001.fastq.gz

    Example:

    I want to find file which has "96_CN_11_B21_M1_C3_P2".

    Code:
    find . -name "*.fastq.*" | xargs grep "96_CN"
    Code:
     find . -name "*.fastq.gz" -exec egrep -Hn "96_CN_" {} \;
    Above commands take ages. I have waited more than 90 mins for the output and it still was processing.

    Please guide here.
    Again, sorry for this small query.
    Last edited by bio_informatics; 03-04-2015, 05:27 AM.
    Bioinformaticscally calm
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    I assume you are looking to locate file names/paths?

    If you are going to be needing to do this frequently perhaps creating a database of file/folder names like so

    Code:
    $ updatedb --require-visibility 0 -U /folder_hierarchy_to_search -o mydb
    Then you can use "locate" command to find file names/paths very rapidly

    Code:
    $ locate -d mydb filename_to_search
    Creating the database could take a significant amount of time but would be worth the returns.

    Comment

    • bio_informatics
      Senior Member
      • Nov 2013
      • 182

      #3
      Hi Genomax,
      Thank you for your reply.
      I would be working on files for few months, then new data - new files names.
      Creating database is a nice idea but not worth when new data floods in on regular basis.

      I will post it on some other forum. (Will mention this post as well)
      Bioinformaticscally calm

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        This is not a regular database and may be the fastest way of finding files (http://linux-sxs.org/utilities/updatedb.html) and http://en.wikipedia.org/wiki/Locate_%28Unix%29.

        You could look into running this as a cron job each day so you would not need to worry about it.

        Comment

        • bio_informatics
          Senior Member
          • Nov 2013
          • 182

          #5
          Thank you for your valuable suggestions. I didn't know if anything of this type existed.
          I'm on a cluster and do not have much rights.
          I will definitely take these suggestions into consideration.
          Bioinformaticscally calm

          Comment

          • bio_informatics
            Senior Member
            • Nov 2013
            • 182

            #6
            I'm doing wrong here:
            find . -name "*.fastq.*" | xargs grep "96_CN"
            It finds .fastq files and in them, it tries to look/grep "96_CN".
            fastq.gz are binary files and it would definitely take ages to grep.

            Didn't try this on small directories first.

            --
            The correct one I got as:

            Code:
            time find . -name "*96_CN_11_B21_M1_C3_P2*"
            Got path and file names needed with time.
            real 0m0.434s
            user 0m0.037s
            sys 0m0.152s
            Last edited by bio_informatics; 03-04-2015, 05:44 AM.
            Bioinformaticscally calm

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              Can you try this?

              Code:
              $ find . -type f -name "*.fastq.*" | grep "96_CN"

              Comment

              • bio_informatics
                Senior Member
                • Nov 2013
                • 182

                #8
                Genomax:
                I was making a horrible error in my command.
                Morning with fresh mind picked it up instantly.
                Bioinformaticscally calm

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  Consider adding "-type f" to your find command since you are only looking for files.

                  Comment

                  • bio_informatics
                    Senior Member
                    • Nov 2013
                    • 182

                    #10
                    That worked.

                    Thanks much for your help and time.
                    Bioinformaticscally calm

                    Comment

                    Latest Articles

                    Collapse

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    17 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    27 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    38 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    61 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...