Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Full Genomic Database and corresponding chromosomal databases

    I am doing some experiment using BowTie and Q-Pick. However, one works with full Human genomic database (BowTie) and another works with it's corresponding chromosomal databases (for chromosome 1,2, 3....23). Now from here, I found full Human Genome Database for h19 (contains 23 chromosome files one for each chromosome i.e. chromFa.tar.gz archive). However, can't understand , if I concatenate all those 23 files in a single file (say using cat command) and give input to the BowTie tool, is it acceptable ? Means does concatenated all chromosome files = Full Genomic database ? More specifically, each chromosome starts with chr(chromosome number)>, should I include those while concatenating or remove those tags ?
    Last edited by Arupsss; 06-18-2012, 07:31 AM.

  • #2
    I am not sure I understand your question. Bowtie can work with an entire genome, with chromsomes or with parts of chromosomes. So there is no need to have one large file and plenty of reasons not to (e.g., ease of manipulation, ease of visualization, etc.) However ff you do wish to concatenate all of the chromsomes together into one large genome file then leave the '>' part in place. Good luck with your analysis.

    Comment


    • #3
      Originally posted by westerman View Post
      I am not sure I understand your question. Bowtie can work with an entire genome, with chromsomes or with parts of chromosomes. So there is no need to have one large file and plenty of reasons not to (e.g., ease of manipulation, ease of visualization, etc.) However ff you do wish to concatenate all of the chromsomes together into one large genome file then leave the '>' part in place. Good luck with your analysis.
      Thanks a lot. So, while concatenating, suppose chr1>NN..AG..NN and chr2>NN..GC...NN, I should remove the > means output is : chr1NN..AG..NNchr2NN..GC...NN. And give input the concatenated file to BowTie. Am I correct ?

      Comment


      • #4
        Your files should look something like:

        >chr1
        NN..AG..NN

        And the next file should look like:

        >chr2
        NN..GC..NN

        When you cat these files together leave in the '>' part to get a large file that looks like:


        >chr1
        NN..AG..NN
        >chr2
        NN..GC..NN

        Unless I misunderstanding your question, this is simple FastA format manipulation.

        Comment


        • #5
          Originally posted by westerman View Post
          Your files should look something like:

          >chr1
          NN..AG..NN

          And the next file should look like:

          >chr2
          NN..GC..NN

          When you cat these files together leave in the '>' part to get a large file that looks like:


          >chr1
          NN..AG..NN
          >chr2
          NN..GC..NN

          Unless I misunderstanding your question, this is simple FastA format manipulation.
          Yah. I am trying to do that simple FastA format manipulation thus I can give it as a single file input to BowTie. However, "'>' part" means only ">" or ">chr2>" because in the above large file example you just cat those files, no part is dropped.

          Comment


          • #6
            Save yourself a significant amount of effort and just download the pre-built bowtie indexes for hg19 from here: ftp://ftp.cbcb.umd.edu/pub/data/bowt.../hg19.ebwt.zip

            Comment


            • #7
              Originally posted by GenoMax View Post
              Save yourself a significant amount of effort and just download the pre-built bowtie indexes for hg19 from here: ftp://ftp.cbcb.umd.edu/pub/data/bowt.../hg19.ebwt.zip
              Thanks a lot. However, I have many chromosomal sequences (not only for Human or hg19/18). I have to do it for all. I don't think for all I can get prebuilt indexes. Another point is that for some cases I have to include/exclude sex related chromosomal sequence.

              Comment


              • #8
                I guess you are trying to do much of this on windows. It may be time to put some effort into using a unix distro. There are several unix distributions that you can try. You may want to experiment with "bioliunx" which has a lot of pre-built bioinformatics apps (http://nebc.nerc.ac.uk/tools/bio-linux/bio-linux-6.0).

                You are bound to run into some issue (sooner than later) where trying to do this type of analysis on windows (editing/handling large files is one thing that comes to mind).

                A simple unix command like "cat file1 fie2 file3 > final.fa" would achieve what you were asking about in the original question.

                Originally posted by Arupsss View Post
                Thanks a lot. However, I have many chromosomal sequences (not only for Human or hg19/18). I have to do it for all. I don't think for all I can get prebuilt indexes. Another point is that for some cases I have to include/exclude sex related chromosomal sequence.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  05-06-2024, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 05-14-2024, 07:03 AM
                0 responses
                26 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-10-2024, 06:35 AM
                0 responses
                45 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-09-2024, 02:46 PM
                0 responses
                59 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-07-2024, 06:57 AM
                0 responses
                46 views
                0 likes
                Last Post seqadmin  
                Working...
                X