Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Explain Samtools like I am a 5 year old

    Brief history
    Bioinformatics expertise: Novice
    Objective: Want to compile bam index files to view aligned reads in IGV.
    Starting materials: downloaded Perl, latest samtools, and have ~30 bam files ready and eager to visualize.
    OS: Windows 7 64-bit

    I am not a computer scientist. I have no intention of learning all there is to know about command line bioinformatics. Someone please give me a point-by-point explanation of how I go about generating bam index files all the way up from opening the correct terminal to executing the command. Every manual I have found about samtools assumes that I have some working knowledge of command line....I have none. Don't know how to proceed from opening a command terminal to building in samtools or assigning directories or anything.

    Keep it simple--assume I am 5.

  • #2
    Did you already install samtools or just download the source code from sourceforge? Doing anything like this on windows, given your knowledge base, is going to daunting. You might just do the following:

    1) Download this really old version of samtools for windows.
    2) Unzip the file and copy the .exe file to whereever you BAM files are.
    3) Open up Windows Explorer and navigate to wherever your BAM files and the samtools .exe file are.
    4) Right click on one of the BAM files and select "Copy address" (or maybe "Copy address as text", one of them should work).
    4) Click the start button/image and type "cmd" in the search box, click on the result.
    5) In the terminal, type "cd " (note, there is a space after the "cd" command)
    6) Paste the address that you copied in step 4 (you may have to go to Edit->paste).
    7) Delete the file name, you just need the path.
    8) For each file, type "samtools.exe index the_file_name.bam", where "the_file_name.bam" is obviously each file name.
    9) If the files aren't sorted, you'll have to "samtools sort original.bam original.sorted.bam" (change original.bam to the file name) and then index the sorted BAM files.

    I don't use windows, but this should be approximately correct. If it doesn't work, your best bet is to find someone with a Mac or who runs linux and buy him/her a beer.

    Comment


    • #3
      Thanks, dpryan, for that response. An error message is returned that "samtools.exe" is not a recognized command. Regardless, that explanation is just the kind I needed and hopefully someone can polish up the issue even more.

      Comment


      • #4
        Hmm, I remembered the current directory being added to the $PATH in Windows, I guess that either changed or I'm misremembering. If you type:
        Code:
        echo %path%
        You'll get a list of directories to which you can move the samtools.exe file. Move it to one of those and the "samtools.exe ..." command should work.

        Comment


        • #5
          Thanks again for the feedback but I solved my problem with an alternative method.
          For those with similar issues, follow this procedure:
          1. Download GenomeBrowse for free at <http://www.goldenhelix.com/GenomeBrowse/>
          2. Install, download the reference sequence to your local directory
          3. Upload any .bam files of your choosing. The software will automatically index the bam files (.bai output) to the directory in which the .bam files are contained.
          4. Once indexed, the visual read counts will show up and you can navigate the sequences however you so chose.

          Comment


          • #6
            Talk about using a sledgehammer to crack a walnut..

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Working...
            X