Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • newbler2.6 454 and illumina seq, help

    Hi all,

    I am trying to assembly a trancriptome with 454 and Illumina sequences using newbler2.6. The runAssembler finished without error. It charges the sequences; but no illumina sequences have been aligned, only 454 ones. It seems that the program don't use the file to do the assemble, in the tests only expend 6 second in finish the analysis. We have tested with different parameters and options without result. Could you help me?

    Thanks,

    Ximo

    /runAssembly -cdna -mi 95 -ml 20 illumina_.sfastaq 454_.sfastaq

    454NewblerMetrics.txt: readAlignmentResults
    {
    file
    {
    path = "/illumina_.sfastq";

    numAlignedReads = 0, 0.00%;
    numAlignedBases = 0, 0.00%;
    inferredReadError = 0.00%, 0;

    I have tried to test if runAssembly can read my illumina_fastaq sequences with this test

    /runAssembly -cdna -mi 95 -ml 50% illumina_test_file.fastaq

    output:

    >Created assembly project directory newbler_test
    >1 read file successfully added.
    > test_100000_ill (Fastq dataset, with standard scores)
    >Assembly computation starting at: Tue Mar 27 12:35:19 2012 (v2.6 (20110517_1502))
    >Indexing/Screening test_100000_ill (with quality scores)...
    > -> 100000 reads, 3668500 bases.
    >Building contigs/isotigs...
    > -> 0 large contigs, 0 all contigs
    > -> 0 isogroups, 0 isotigs
    >Computing signals...
    > -> 0 of 0...
    >Checkpointing...
    >Generating output...
    > -> 0 of 0...
    >Assembly computation succeeded at: Tue Mar 27 12:35:23 2012

    The runAssembler can read my sequences (test without cdna option):

    /runAssembly -mi 95 -ml 50% -urt illumina_test_file.fastaq

    runAssembly -o newbler_test test_file.100000_ill [12:41:44]

    Output:
    >Created assembly project directory newbler_test
    >1 read file successfully added.
    >test_100000_ill (Fastq dataset, with standard scores)
    >Assembly computation starting at: Tue Mar 27 13:02:27 2012 (v2.6 (20110517_1502))
    >Indexing test_100000_ill (with quality scores)...
    > -> 100000 reads, 3668500 bases.
    > Warning: Suspected 5' primer AAGCAGTGGTATCAACGCAGAGTAC, 15773 exact matches found.
    > Warning: Suspected 5' primer AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTT, 8227 exact matches found.
    > Warning: Suspected 3' primer GTACTCTGCGTTGATACCACTGCTT, 2397 exact matches found.
    > Warning: Suspected 3' primer AAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT, 8227 exact matches found.
    >Building contigs/scaffolds...
    > -> 0 large contigs, 0 all contigs
    >Computing signals...
    -> 0 of 0...
    >Checkpointing...
    >Generating output...
    > -> 0 of 0...
    >Assembly computation succeeded at: Tue Mar 27 13:02:29 2012

  • #2
    It may not be too helpful, but we used illumina and 454 together in a genome project.

    We used gsAssembler to build 454 contigs, then Velvet to assemble Illumina.

    Afterwards we included mapped fragments <2000bp of the Illumina assembly as fake reads
    in Newbler.

    It didn't improve results too much however, so we ended up using SSPACE with PE Illumina reads and 454 contigs.

    Comment


    • #3
      Looks like something is wrong with your fastq file. Could you post the first few lines (say 12 or 16)?

      Comment


      • #4
        Originally posted by flxlex View Post
        Looks like something is wrong with your fastq file. Could you post the first few lines (say 12 or 16)?
        I have used this file with mira and bwa whitout problems, but?

        Thanks

        @CUES000161
        AGAGAATCACCTGCTCAGTACAAAAATAATGACGCCCA
        +
        ######################################
        @CUES000162
        AAGCAGTGGCATCAACGCAGAGTACGC
        +
        GG5>3C;AC<DD=DDFFFAD@?79<><
        @CUES000163
        AGATTGTTGCCTGGATTATGATATGATACAATACAAAT
        +
        HHGHHHHGFH?HHHHHH0HHHHADHCHHHHEHGHHH=H
        @CUES000164
        TCTTGTTGTTCGAGTCAATAGGAGCTGTACTCTGTACT
        +
        FEFEFFFEFFEE:<FEE:EEFFFBFFEFF>G:F@=CCE
        @CUES000165
        GATATGTTTGTAGGAATTTTCTTGAACTTTTTACCAAT
        +
        GGGGGGCCCG3FCDD55544GGBBGBGGGGGGGGGGFE
        @CUES000166
        CTTTGCTTCTTCAGTTCAAATTGGAATTTGAGCTCGGA
        +
        C>@AC3CCCCA>.@<[email protected]
        @CUES000167
        ATTGGATATTTTTGTTAAATTATGTTTGTTCCAAAGAT
        +
        HHGHHGGHHHHHEEEEEHHHHHHHHHHHHHHHHHHHGA
        @CUES000168
        TATACTTATGTACAAGACGCTGTTATTGATATTAAATC
        +
        GHHCHHHHHHHHHGGHHFHHGHHE8EDFFFBHHGF1DA
        @CUES000169
        AGAATGTGAACCCACACACACAGCCATTTGGATCACTT
        +
        AEGDGGGGGDGGFEGEGEECG2GCGCCGGGGFGGCGCG
        @CUES000170
        CGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTT
        +
        ?>4?CC8;<F3)A@3DBBD5A459<FF??FBBBBBBBB
        @CUES000171
        TTGGAGGAAAGTTCAGCCATCCCAATAATGAAAGAGAT
        +
        ?FFFDFFDGGDG?FGAGGADDGG=AC5?CC2DDD=AFF
        @CUES000172
        GATGAACATTTTAAAATCTTAATTCCTCCAATTTGGAT
        +
        CCCCCAGAGGGGGGCCGFGGGGGGGGGGGGGGGGGGGG
        @CUES000173
        GGTATGGGTGAGTTTGGTGATCGTTACTTCGGAACTGA
        +
        HHGHHHHHEHEHHHHHHDHBFGGFG@FGGFHHHEHHHE
        @CUES000174
        TTCCAAAGGGGTCGCCTTTTCAATCTCCACCATTCATG
        +
        GGGDDC;CCCGCGEGG?EGCEBBEEGGB7GFBEFG?0D
        @CUES000175
        ATCCAACTGCTGTGGAAGGCCGTCTCCTTTCAGTCAGC
        +
        ==<<;1@9@>=E@EEHHACHHHHHHHHHHDH?HHEHHH
        @CUES000176
        GAGAAGGGTTATCAGATCATGATTCCTTTCTTTGATTG
        +
        BGHGHHGCDHHDFHFGHHHEHAHHHCHHHBHHCCFHHH
        @CUES000177
        TATATTCTTCGGGCAGCCGCCATTAAAGCTTTGGGATC
        +
        FFF?FAF?FFDDGAGDFG?DA=G=5C/=ACGG?.GAA=
        @CUES000178
        AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTT
        +
        HGHHHHHHHGEHHHHHHHHHHHHHH8=DAD=;<6<>C=
        @CUES000179
        TGAATTTCTATCTACAAACATGAACAATACCAATCTCT
        +
        DDADAD5@@AFFFFF>>;?>GD55A>>>?;:A/AD;A?
        @CUES000180
        AGCAGCCTCCACGTATGAACTCATCGTCACGTTAGATT
        +
        HGGEDHHHHHEHHHHHH>HHCECHHHHHDDHFEBF3<A

        Comment


        • #5
          Sorry, nothing is wrong with your file of course. However, newbler will not recognize it. It expects this header style:

          Read 1:
          Code:
          @EAS139_FC706VJ:2:2104:15343:197393#0/1
          GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
          +
          IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
          Read1 (in a separate file)
          Code:
          @EAS139_FC706VJ:2:2104:15343:197393#0/2
          CGATGGTCGTTTCGGAAGATGACGTGAATTGCCTGG
          +
          IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
          The /1 and /2 at the end tell newbler the pairing info.

          One solution would be to adjust the header. Alternatively, you could convert your files to fasta + qual files, and include the pairing information int header, as I explain in my blog post here.

          Comment


          • #6
            flxlex,

            Thanks for your help and for your useful post, but my reads are not paired-end. Do you know if Newbler works with non paired-end Illumina data?

            Thanks

            Ximo

            Comment


            • #7
              Not if they are that short. Newbler's minimum read length is 50 bases, which I now see is why your 36 base reads did not assemble. You could try setting the minlen parameter to your read length. But don't try to assemble the Illumina reads only using newbler, it is not built for such short reads...

              Comment


              • #8
                454 newbler runMapping alignment

                Hello,

                Does anyone know if 454 runMapping alignment doing local alignment or global alignment?

                Any information on how its aligning algorithm is helpful.

                Thanks

                Comment


                • #9
                  Originally posted by flxlex View Post
                  Not if they are that short. Newbler's minimum read length is 50 bases, which I now see is why your 36 base reads did not assemble. You could try setting the minlen parameter to your read length. But don't try to assemble the Illumina reads only using newbler, it is not built for such short reads...
                  I have tested this parameter, but I have the same result. When I have used 454 and Illumina seqs, it makes the assembling but in the 454ReadStatus.txt the illumina seqs are all labeled as TooShort

                  runAssembly -ml 50% -mi 95 -minlen 15 -o newbler_test test_100000_ill test_100000_454


                  Any suggestion?
                  Thanks

                  Comment


                  • #10
                    Oops... I had forgotten that reads between minlen and 50 bases only are used when there is at least one read dataset that newbler recognizes as paired end (i.e. mate pair, long insert library). In your case, I don't think you can use newbler for your short reads. Perhaps you can assemble the Illumina reads into contigs using something like velvet, and use those contigs as reads for a contigs+454 reads assembly?

                    Comment


                    • #11
                      Ok. Thanks a lot

                      Ximo

                      Comment


                      • #12
                        I just saw that newbler 2.7, which just came out, has a new flag: -short "Force use of reads shorter than 50 bp in projects that don’t include any paired end data. Reads shorter than 50bp are automatically used if any paired-end data is used in the project. The lower limit is 20 bp (or minlen if –minlen is used)."

                        So, I advice you to try to get you hands on this version (through the Roche website)!

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 12:08 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        17 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        14 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        43 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X