Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about Illumina reads, SNPs and mapping assemblies

    Hi,

    I am a newbie PhD student and I just started working with Illumina DNA reads (~80 bp). My main experience comes from 454 assemblies (using MIRA, newbler).

    I have some doubts about how to use the Illumina reads' information. Basically, I'd like to check the support (using coverage and quality) of the Illumina reads I have (30M+, single end) for each position of my 454 genome, eventually editing the genomic sequence if, let's say, 95% of the reads aligned on a particular position suggest the same mismatch.
    Does anybody know if there are tools that make this?

    Also, I could use the Illumina reads for a mapping assembly. I never did one, as I never managed to get MIRA not to crash while doing it. Does anybody know if there are tools that not only map short reads on a reference sequence but also INTEGRATE the mapped assembly and the reference one?

    I am a little bit lost. Any comment or suggestion is welcome!

  • #2
    For a reference mapping re-sequencing style assembly you could take a look at Mosaik.

    Comment


    • #3
      Originally posted by natstreet View Post
      For a reference mapping re-sequencing style assembly you could take a look at Mosaik.
      hi, thanks for the advice. I am trying, but MosaikAligner wants 181 days to process my 35M illumine reads. I already aligned my reads to the reference genome with BWA and it didn't have all this slowness, I can't understand the problem. Ideally I would only be interested in MosaikAssembler, by my reads are in SAM format and Mosaik wants its own format. DO you know a converter?

      Comment


      • #4
        I'm not sure if there is a convertor for SAM/BAM to mosaik .dat format.

        What command are you using for MosaikAligner? Did you create a jump database using MosaikJump?

        The last time I mapped some 76 bp Illumina reads I used these option

        Code:
        MosaikAligner -in in.dat -out out.dat -ia ref.dat -j ref_15 -hs 15 -mm 3 -act 35 -bw 29 -mhp 100 -p 12
        where -j ref_15 is the output of MosaikJump with -hs 15

        For MasaikAligner set -p as high as you can on your machine. In general I've found Mosaik to be fairly fast but not as fast as bowtie.

        Comment


        • #5
          There is another option called nesoni I am checking out at the moment. It allows you to map Illumina reads against 454 contigs with Shrimp2 and then attempts to integrate the output.
          I'm not sure I completely understand what it's doing and how well its working for my dataset but it might be helpful for you.

          See Torst's posts on this forum.

          Comment


          • #6
            hi,
            thanks for the help.
            I found out that Velvet, since this summer, has a new module called Columbus for mapping assemblies:


            it worked well, or at least it seems... I have to check better the contigs. But it didn't crash like MIRA, it was not limited to 2M reads like MAQ and it didn't ask me to reduce the memory usage with something like MosaikJump. Actually, do you know if there is any benchmark for mapping assemblers?

            Comment


            • #7
              Velvet rejects sam file to contain reference sequences

              Hi allcreation
              Is it possible for you to give som directions on how you created the SAM file and reference sequence for your Velvet-Columbus run?
              I have tried to do exactly as it is described in the manual but I get the following error message:
              Code:
              [0.256032] SAM file r5p12t6_07testmap_novo.sorted.sam cannot contain reference sequences.
              I have my reference sequence in the described for format:
              Code:
              >contig00001:1-35524
              gACGCCGCGCGCCGCGGCCAGGGCTGGCCCACGGCCcTCTTCCGGCGCGCTGCGCAGGCG
              TTCGGCCAGGCCGCGCGGCGTCGGCTGGCTGAGCGCCCAGCGTAGCAGGCGATCGAACGG
              ATGCCGACGGGCGCTTTCCAGTCGTTCGCGCAAACGGGCGATCAACTGGGCGATCAACAG
              CGAGTCGCCGCCAGCCCCGAAGAAGTCTTGCTCGACGCCCAGCGACGGGTTGTCCAGCAC
              CTCCCGCCAGAGTGCCAGCAGCGCATTCTCCAGTTCGTCGGCCGGTGCCTGCGCGACGCC
              And my SAM file which I created with Novoalign was sorted like this:
              Code:
              sort SAMfile.sam > SAMfile.sorted.sam
              The header in the fasta file I used as reference for the alignment I have tried both like this
              >contig00001:1-35524
              and like this
              >contig00001
              But nothing avoids the error message.


              So maybe you can give and header on some of your input data?

              Best, s052866
              Last edited by s052866; 12-08-2010, 02:47 PM.

              Comment


              • #8
                Originally posted by s052866 View Post
                Hi allcreation
                Is it possible for you to give som directions on how you created the SAM file and reference sequence for your Velvet-Columbus run?
                I have tried to do exactly as it is described in the manual but I get the following error message:
                Code:
                [0.256032] SAM file r5p12t6_07testmap_novo.sorted.sam cannot contain reference sequences.
                I have my reference sequence in the described for format:
                Code:
                >contig00001:1-35524
                gACGCCGCGCGCCGCGGCCAGGGCTGGCCCACGGCCcTCTTCCGGCGCGCTGCGCAGGCG
                TTCGGCCAGGCCGCGCGGCGTCGGCTGGCTGAGCGCCCAGCGTAGCAGGCGATCGAACGG
                ATGCCGACGGGCGCTTTCCAGTCGTTCGCGCAAACGGGCGATCAACTGGGCGATCAACAG
                CGAGTCGCCGCCAGCCCCGAAGAAGTCTTGCTCGACGCCCAGCGACGGGTTGTCCAGCAC
                CTCCCGCCAGAGTGCCAGCAGCGCATTCTCCAGTTCGTCGGCCGGTGCCTGCGCGACGCC
                And my SAM file which I created with Novoalign was sorted like this:
                Code:
                sort SAMfile.sam > SAMfile.sorted.sam
                The header in the fasta file I used as reference for the alignment I have tried both like this
                >contig00001:1-35524
                and like this
                >contig00001
                But nothing avoids the error message.


                So maybe you can give and header on some of your input data?

                Best, s052866

                HI,

                to create my SAM file I started from the fastq files I had and I gave them to BWA

                The headers of my reference sequences were like:

                >ref1:1-100000

                One thing I can think of is... do your reference sequences and your SAM file reference sequences have the same identical names? This could cause Velvet to not be able to match the information between the reference and the SAM file.

                Edit: actually the error message is telling you that the SAM file can't contain reference sequences... are you using a command line like
                velveth ./ 21 -reference ref.fasta -short -sam sorted.sam > Log.txt
                ?

                Comment


                • #9
                  Hi allcreation

                  I found out that my problem was because I did not have the "-short" in my commandline. Thanks.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 10:49 AM
                  0 responses
                  17 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-25-2024, 11:49 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-24-2024, 08:47 AM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  62 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X