Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mapping on Windows 7 PC (Galaxy etc.)

    Dear all,

    I would like to map som RNA-seq data using BWA. This can be done in Galaxy for instance. However, I have one problem regarding the reference genome. It is mm9 with one additional custom chromosome (as I'm looking for special occurences of fusion events). The chromosomes I have in FASTA format download from UCSC, however how do I make a multifasta file or even a len file for mapping? I have tried to open in text editors but the apps run out of memory.

  • #2
    If you have to use Windows, you can do quite a lot at the command line - especially if you install Cygwin for a Linux like environment. You could also do sequence manipulation with a scripting language like Perl or Python - both BioPerl and Biopython should be fine under Windows.

    However, in the long run I think you will find sequencing data analysis easier under Linux or Mac OS X (which is a type of Unix) than on Windows - simply because this is what most of the cutting edge tools are designed and tested on.

    Alternatively, sticking with Galaxy, you can concatenate FASTA files together to make a new reference (i.e. combine the mm9 FASTA with your custom chromosome FASTA) using their "Concatenate datasets" tool.

    Comment


    • #3
      Thanks for your reply. I think I will stick with galaxy for now as I have only reached the learning phase of data analysis. However, in the long run we may setup a dedicated work station.

      Do you perhaps know, if it is possible to strip down the read lengths of paired end reads? The thing is I have 101 bp reads, is it possible to e.g. strip down to 50 bp and map on basis of that?

      Comment


      • #4
        Originally posted by puggie View Post
        Do you perhaps know, if it is possible to strip down the read lengths of paired end reads? The thing is I have 101 bp reads, is it possible to e.g. strip down to 50 bp and map on basis of that?
        Yes, but why do that? Having longer reads should give more specificity for the mapping. Trimming the reads based on their individual qualities makes more sense - I think there are Galaxy videocast/tutorials on this kind of thing.

        Comment


        • #5
          Originally posted by maubp View Post
          Yes, but why do that? Having longer reads should give more specificity for the mapping. Trimming the reads based on their individual qualities makes more sense - I think there are Galaxy videocast/tutorials on this kind of thing.

          Yes I agree. The think is that we are investigating special kinds of chimera, where we expect many reads to catch chimeric sequences. Hence a number of reads will be chimeric. In Galaxy I can apply certain filters as to match read pairs in which each mate map to different chromosomes (mouse chromosome + our custom chromosome). However, will we loose data if say Read 1 is chimeric over 101 bp length?

          It has be noted that many transcripts will start from our custom genome, and read length to custom genome may be below 101 bp.

          That is why I figured I could do the mapping with full length reads, and go shorter and compare.

          Comment


          • #6
            I can see why you might try this now. Have you explored the "Trim (leading or trailing characters)" tool in Galaxy? It can handle FASTQ reads.

            Comment


            • #7
              Originally posted by maubp View Post
              I can see why you might try this now. Have you explored the "Trim (leading or trailing characters)" tool in Galaxy? It can handle FASTQ reads.
              I will try out that tool. Im concatenating my genome assembly now. Thx.

              EDIT: One more thing... Can you please tell me maubp the speed you get via ftp://main.g2.bx.psu.edu ?? I'm on 100+ Mbit line however I get around 5Mbit upload.
              Last edited by puggie; 03-08-2012, 05:52 AM.

              Comment


              • #8
                I've never used FTP with the public Galaxy - I've only ever uploaded small files by HTTP or copy&paste.

                Given you are in Europe and the Galaxy server is somewhere in America, getting 5Mbit uploads doesn't sound too bad. Perhaps ask on the galaxy-user mailing list if this is typical?

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X