Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST localalign SOLiD data

    Hi all, I am using BFAST to align SOLiD PE data.

    I had the following error:

    ************************************************************
    Checking input parameters supplied by the user ...
    Validating fastaFileName /state/partition1/genome/bfast/ucsc.hg19.fasta.
    Validating matchFileName/share/GAMES/data/test20120501/solid0121_20100616_PEcllSureSelect_CLL_11.matched.bmf.
    **** Input arguments look good! *****
    ************************************************************
    ************************************************************
    Printing Program Parameters:
    programMode: [ExecuteProgram]
    fastaFileName: /state/partition1/genome/bfast/ucsc.hg19.fasta
    matchFileName: /share/GAMES/data/test20120501/solid0121_20100616_PEcllSureSelect_CLL_11.matched.bmf
    scoringMatrixFileName: [Not Using]
    ungapped: [Not Using]
    unconstrained: [Not Using]
    space: [Color Space]
    startReadNum: 1
    endReadNum: 2147483647
    offsetLength: 20
    maxNumMatches: 384
    avgMismatchQuality: 10
    numThreads: 1
    queueLength: 25000
    timing: [Not Using]
    ************************************************************
    ************************************************************
    Reading in reference genome from /state/partition1/genome/bfast/ucsc.hg19.fasta.nt.brg.
    In total read 93 contigs for a total of 3137161264 bases
    ************************************************************
    ************************************************************
    Reading match file from /share/GAMES/data/test20120501/solid0121_20100616_PEcllSureSelect_CLL_11.matched.bmf.
    ************************************************************
    Performing alignment...
    Reads processed: 0************************************************************

    In function "AlignColorSpaceGappedConstrained": Fatal Error[OutOfRange]. Message: read and reference did not match.


    Before alignment I built references with fast2brg in nt and cs. I created the masks and the indexes as mentioned in the manual (SOLiD section). I matched the .fastq file (obtained by solid2fastq, 2 .csfasta files + 2 .qual files) with the references. I got a 3.5 Gb .bmf file. When I started the local alignement I got an error as showed above.

    What am I doing wrong? I hope anybody could help me..

    Many thanks.

  • #2
    What where your match commands?

    Comment


    • #3
      Originally posted by nilshomer View Post
      What where your match commands?
      i) Indexes creation using 10 masks (as in BFAST manual):

      $BFASTDIR/bfast/bfast index -n 8 -f $REFERENCEDIR/$REFERENCEFILE -m $MASK<1:10> -w 14 -i 7 -A 1

      10 .bif files created successfully (13 Gb each)

      ii) Matching step

      $BFASTDIR/bfast/bfast match -f $REFERENCEDIR/$REFERENCEFILE -A 1 -r $OUTDIR/$READSFILE.fastq > $OUTDIR/$READSFILE.matched.bmf

      REFERENCEFILE=ucsc.hg19.fa
      READSFILE=solid0121_20100616_PEcllSureSelect_CLL_11 (obtained by running sold2fastq with .csfasta F3 and F5 files and 2 related .qual files)

      Anything wrong?

      Thanks

      Comment


      • #4
        I would take a look at the bfast+bwa branch for paired ends. To debug, I would need a small test case.

        Comment


        • #5
          I'm doing localalign with -U option and it's working. I will take a look at the bwaaln and related PE pipe to compare results.

          I'm also reporting that while testing reads subsets with -s/-e in localalign, I got good results for such debug intervals. Error rised when localalign worked on the entire dataset, at the beginning of the computation (0 reads processed. I tested the interval 1:3000 and it worked).
          I dont want to bother you more.

          Let me know what you think about..


          Thank for your interest

          Cu,
          Marco

          Comment


          • #6
            The postprocess step, after alignment with -U option (it worked fine, I guess, and outputted a 4.5 Gb .baf file), gave me a segmentation fault error.

            See below err message:

            ************************************************************
            Checking input parameters supplied by the user ...
            Validating fastaFileName /state/partition1/genome/bfast/ucsc.hg19.fasta.
            Validating alignFileName /share/GAMES/data/test20120501/solid0121_20100616_PEcllSureSelect_CLL_11.matched.bmf.aln.baf.
            Input arguments look good!
            ************************************************************
            ************************************************************
            Printing Program Parameters:
            programMode: [ExecuteProgram]
            fastaFileName: /state/partition1/genome/bfast/ucsc.hg19.fasta
            alignFileName: /share/GAMES/data/test20120501/solid0121_20100616_PEcllSureSelect_CLL_11.matched.bmf.aln.baf
            algorithm: [Best Score]
            space: [Color Space]
            strandedness: [Opposite strand]
            positioning: [Read one first]
            pairing: [Paired End]
            avgMismatchQuality: 10
            scoringMatrixFileName: [Not Using]
            randomBest: [Not Using]
            minMappingQuality: -2147483648
            minNormalizedScore: -2147483648
            insertSizeAvg: 0.000000
            insertSizeStdDev: 0.000000
            numThreads: 8
            queueLength: 100000
            outputFormat: [SAM]
            outputID: [Not Using]
            RGFileName: [Not Using]
            baseQualityType: [MAQ-style]
            timing: [Not Using]
            ************************************************************
            ************************************************************
            Reading in reference genome from /state/partition1/genome/bfast/ucsc.hg19.fasta.nt.brg.
            In total read 93 contigs for a total of 3137161264 bases
            ************************************************************
            Postprocessing...
            ************************************************************
            Estimating paired end distance...
            Found only 0 distances to infer the insert size distribution
            ************************************************************

            In function "GetPEDBins": Warning[OutOfRange]. Variable/Value: b->numDistances.
            Message: Not enough distances to infer insert size distribution.
            ***** Warning *****
            ************************************************************
            /opt/torque/mom_priv/jobs/878.deepseq.unife.it.SC: line 30: 28843 Segmentation fault $BFASTDIR/bfast/bfast postprocess -n 8 -f $REFERENCEDIR/$REFERENCEFILE -i $OUTDIR/$ALNFILE.baf -A 1 -Y 0 > $OUTDIR/$ALNFILE.sam

            Comment


            • #7
              Very interesting, can you create a small test case to debug?

              Comment


              • #8
                Hi, I ran bfast+BWA on centOS cluster, segmentation foult again (in index creation). It ran fine on my debian laptop. Any issues related to centOS?

                Thank

                Comment


                • #9
                  I am sorry, there is not enough information to debug.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X