Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting data into Golden Helix Genome Broswer

    Hello I'm trying to get my sorted bam files to read into golden helixs genome broswer. I'm currently using version 2.1.0. Alot of my data comes into the broswer as Did not successfully write coverage data amoung other issues.

    I've done this process

    1.Download Data from NCBI in form of Fastq/Fasta format

    2.Download data from igenome

    3.Unzip Igenome data in linux

    4.Build alignment Bowtie2-align -x “name of Built index” “.fastq” -S “.Sam”

    5. Convert Sam file to Bam file with samtools view -Sb “Sam File” > “Bam File”

    6. samtools sort -n*if not sorted for cufflinks “bam file”

    The data will go through things like cufflinks/tophats. They are also 7gig files but doesnt work in the broswer.


    What can I do to fix this?

  • #2
    Is there a specific reason to use Golden Helix? Why not use IGV from Broad, if you just want to look at your bam files. You also need to index your sorted bam file (would be step 7 in your workflow).

    Comment


    • #3
      samtools sort -n
      According to the command you've posted, you're sorting the BAM file by read name.
      To view the BAM file in any genome browser, you'll want to sort it by coordinates. Just samtools sort, without the -n.

      Code:
      samtools sort
      As pointed out by @GenoMax, you'll want to index the sorted BAM file, at least for IGV, and the genome browsers I know.
      I'll second @GenoMax's vote for IGV. Always choose open source software over commercial software, even freeware. That being said, GenomeBrowse may have features that justify picking it over IGV, but IGV is a pretty nifty program.

      Comment


      • #4
        Originally posted by GenoMax View Post
        Is there a specific reason to use Golden Helix? Why not use IGV from Broad, if you just want to look at your bam files. You also need to index your sorted bam file (would be step 7 in your workflow).
        I want to compare things in the liver over multiple SRAs and I will try to sort it and report back

        Comment


        • #5
          Originally posted by blancha View Post
          According to the command you've posted, you're sorting the BAM file by read name.
          To view the BAM file in any genome browser, you'll want to sort it by coordinates. Just samtools sort, without the -n.

          Code:
          samtools sort
          As pointed out by @GenoMax, you'll want to index the sorted BAM file, at least for IGV, and the genome browsers I know.
          I'll second @GenoMax's vote for IGV. Always choose open source software over commercial software, even freeware. That being said, GenomeBrowse may have features that justify picking it over IGV, but IGV is a pretty nifty program.
          Considering my current method can you give me a quick run down on how to insert it into IGV. I'm not getting anything.

          What is the process from fasta or fastq or whatever format to IGV

          Comment


          • #6
            Your "process" up to step 5 is ok.

            6. Samtools "sort" (no -n) your_fle.bam your_file_sorted.
            7. Samtools "index" your_file_sorted.bam.

            Read the IGV user guide in case you are not able to figure things out by starting IGV and pointing it to the directory containing the sorted bam and the bai index file. Remember to select the correct genome build before loading the sorted bam file.

            Comment


            • #7
              Originally posted by GenoMax View Post
              Your "process" up to step 5 is ok.

              6. Samtools "sort" (no -n) your_fle.bam your_file_sorted.
              7. Samtools "index" your_file_sorted.bam.

              Read the IGV user guide in case you are not able to figure things out by starting IGV and pointing it to the directory containing the sorted bam and the bai index file. Remember to select the correct genome build before loading the sorted bam file.
              gave it a run with a mouse genome and .bam. Got this



              As you can see the file is over a gigabyte so there is data in there

              Comment


              • #8
                Did you miss the note up in the main browser window that says "Zoom in to see alignments"?

                By default IGV shows you the entire genome. You have to select a chromosome (or type a gene name in the "go" box to select a region. Even then you may have to click on the "+" sign in top right corner before you start seeing actual reads aligned to genome. You can keep going till you actually see individual bases.

                Comment


                • #9
                  Originally posted by GenoMax View Post
                  Did you miss the note up in the main browser window that says "Zoom in to see alignments"?

                  By default IGV shows you the entire genome. You have to select a chromosome (or type a gene name in the "go" box to select a region. Even then you may have to click on the "+" sign in top right corner before you start seeing actual reads aligned to genome. You can keep going till you actually see individual bases.
                  Went all the way and didnt see anything. Nothing is showing up

                  Comment


                  • #10
                    My guess is that your BAM file has chromosome names that do not match what is provided by IGV in terms of the reference (e.g. chr2 vs 2). That is assuming you have selected the correct reference (I see Mouse 129S1 etc in the screenshot above).

                    Where did you get your reference genome from? If you are using a non-standard genome then you can load your own reference sequence in and use it to display data against.

                    Can you post the header from your bam?

                    Code:
                    $ samtools view -H your_bam

                    Comment


                    • #11
                      Originally posted by GenoMax View Post
                      My guess is that your BAM file has chromosome names that do not match what is provided by IGV in terms of the reference (e.g. chr2 vs 2). That is assuming you have selected the correct reference (I see Mouse 129S1 etc in the screenshot above).

                      Where did you get your reference genome from? If you are using a non-standard genome then you can load your own reference sequence in and use it to display data against.

                      Can you post the header from your bam?

                      Code:
                      $ samtools view -H your_bam
                      Thanks for the help. I got the reference one from igenome. Trying to get this mouse data to show.

                      Comment


                      • #12
                        Which version did you download from iGenomes? This looks like NCBI or Ensembl since UCSC versions have the word chr in front of the chromosome number. This is certainly not the 129S1 mouse genome as you had selected in the screenshot above.

                        I am going to suggest that you use the sequence and the annotation in your iGenomes download in IGV so everything matches and you can display the data. See "Loading a genome" section.

                        Comment


                        • #13
                          Originally posted by GenoMax View Post
                          Which version did you download from iGenomes? This looks like NCBI or Ensembl since UCSC versions have the word chr in front of the chromosome number. This is certainly not the 129S1 mouse genome as you had selected in the screenshot above.

                          I am going to suggest that you use the sequence and the annotation in your iGenomes download in IGV so everything matches and you can display the data. See "Loading a genome" section.
                          I downloaded the NCBI GRCm38 version. Should I use something else? Which one would you use?

                          Comment


                          • #14
                            In IGV, just pick the mm10 genome instead.
                            It's the same genome as GRCm38.
                            I think you'll have to reload the BAM file after you've selected the correct genome.

                            Then, zoom into a location where you know you will have coverage.
                            For RNA-Seq for example, you could pick a housekeeping gene, like GAPDH.
                            Just type GAPDH in the search box, and click on the Go button.

                            Comment


                            • #15
                              @blancha: mm10 may not work if the one included in IGV is UCSC version which has the "chr" in front of all chromosome numbers.

                              @Milestailsprowe: If above does not work, create a new "genome" by pointing to the iGenomes (/path_to/WholeGenomesFasta/genome.fa file) and use the corresponding GTF file from (/path_to/Annotations/Genes/genes.gtf). Open your BAM file in IGV.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Genetic Variation in Immunogenetics and Antibody Diversity
                                by seqadmin



                                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                                11-06-2024, 07:24 PM
                              • seqadmin
                                Choosing Between NGS and qPCR
                                by seqadmin



                                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                10-18-2024, 07:11 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 11:09 AM
                              0 responses
                              23 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Today, 06:13 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 11-01-2024, 06:09 AM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-30-2024, 05:31 AM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X