Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • biouser
    Junior Member
    • Aug 2012
    • 7

    sequence alignment

    Hi all,
    I want to align some read data in fasta format. i use bowtie short read aligner. but before i align them, i need a refrence sequence. im new in bioinformatics and searched about refrence seq and didnt find anything useful about why we need refrence seq for read alignment.
    please help me on understanding that and how i can download required refrence sequences.

    thank you all. justin
    Last edited by biouser; 08-14-2012, 11:46 AM.
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    You need something to align against, that is the purpose of the reference sequence. What organism is your sequencing from? That will pretty much answer the question of what to download.

    Comment

    • biouser
      Junior Member
      • Aug 2012
      • 7

      #3
      there is a fasta file containing 15million reads which is 454 sequences of Human HapMap, downloaded from genomic paired-end library from ncbi.

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        So, you have a bunch of reads from a human and you want to know where they map. For that you would need a reference human genome sequence. You could use the one from NCBI or the 1000 genome project (there are probably others, I actually don't know off-hand if the NCBI reference differs from that of the 1000 genomes project as I don't do any human sequencing).

        Comment

        • biouser
          Junior Member
          • Aug 2012
          • 7

          #5
          in NCBI ftp there is 2 kinds of files, some are in .fa format and other in .rm.out
          which one is used for refrence sequence?
          i got output from bowtie as below :

          # reads processed: 15281579
          # reads with at least one reported alignment: 610764 (4.00%)
          # reads that failed to align: 14670815 (96.00%)
          Reported 610764 alignments to 1 output stream(s)

          what is the meaning of this output report? does it mean that 4% of reads belong to chromosome 2 that i used as refrence sequnce?

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            You'll want the fa (fasta format) files. The rm.out files are from repeat masker.

            Comment

            • biouser
              Junior Member
              • Aug 2012
              • 7

              #7
              Thank you dpryan ,
              and what about second question? The Bowtie report?

              Comment

              • dpryan
                Devon Ryan
                • Jul 2011
                • 3478

                #8
                Originally posted by biouser View Post
                Thank you dpryan ,
                and what about second question? The Bowtie report?
                Ah, I missed that, mea culpa. It really just means that only 4% aligned. The remainder may not have aligned because (1) they didn't come from chromosome 2 (2) you didn't quality trim prior to alignment and so things couldn't align or (3) there adapter contamination that wasn't trimmed that caused misalignment. For your real run, I would use the "cat" command to concatenate the various chromosomes into a single file, which would then be indexed and mapped against. Since you're using bowtie, you might be able to download prebuilt indexes form the bowtie website. That'll save you a bit of time!

                Comment

                • biouser
                  Junior Member
                  • Aug 2012
                  • 7

                  #9
                  For your real run, I would use the "cat" command to concatenate the various chromosomes into a single file, which would then be indexed and mapped against. Since you're using bowtie, you might be able to download prebuilt indexes form the bowtie website. That'll save you a bit of time!
                  no hay problema. actualy building an index of reference sequence took only 3minutes and alignment against it took several hours.
                  but, is "cat" command one of bowtie's commands? or it is possible using other softwares?

                  Comment

                  • dpryan
                    Devon Ryan
                    • Jul 2011
                    • 3478

                    #10
                    Ah, I assumed that you're using Linux or a Mac, in which case cat is a standard shell program. If you're using windows then I wouldn't have a clue, presumably there's something similar.

                    Comment

                    • biouser
                      Junior Member
                      • Aug 2012
                      • 7

                      #11
                      yes! i forgot that command. i certainly use it.
                      helped me alot dpryan.

                      Comment

                      Latest Articles

                      Collapse

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Yesterday, 11:58 AM
                      0 responses
                      13 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-05-2026, 10:09 AM
                      0 responses
                      25 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-04-2026, 08:59 AM
                      0 responses
                      35 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      60 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...