Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mnfuser
    Junior Member
    • Apr 2012
    • 6

    BFAST localalign SOLiD data

    Hi all, I am using BFAST to align SOLiD PE data.

    I had the following error:

    ************************************************************
    Checking input parameters supplied by the user ...
    Validating fastaFileName /state/partition1/genome/bfast/ucsc.hg19.fasta.
    Validating matchFileName/share/GAMES/data/test20120501/solid0121_20100616_PEcllSureSelect_CLL_11.matched.bmf.
    **** Input arguments look good! *****
    ************************************************************
    ************************************************************
    Printing Program Parameters:
    programMode: [ExecuteProgram]
    fastaFileName: /state/partition1/genome/bfast/ucsc.hg19.fasta
    matchFileName: /share/GAMES/data/test20120501/solid0121_20100616_PEcllSureSelect_CLL_11.matched.bmf
    scoringMatrixFileName: [Not Using]
    ungapped: [Not Using]
    unconstrained: [Not Using]
    space: [Color Space]
    startReadNum: 1
    endReadNum: 2147483647
    offsetLength: 20
    maxNumMatches: 384
    avgMismatchQuality: 10
    numThreads: 1
    queueLength: 25000
    timing: [Not Using]
    ************************************************************
    ************************************************************
    Reading in reference genome from /state/partition1/genome/bfast/ucsc.hg19.fasta.nt.brg.
    In total read 93 contigs for a total of 3137161264 bases
    ************************************************************
    ************************************************************
    Reading match file from /share/GAMES/data/test20120501/solid0121_20100616_PEcllSureSelect_CLL_11.matched.bmf.
    ************************************************************
    Performing alignment...
    Reads processed: 0************************************************************

    In function "AlignColorSpaceGappedConstrained": Fatal Error[OutOfRange]. Message: read and reference did not match.


    Before alignment I built references with fast2brg in nt and cs. I created the masks and the indexes as mentioned in the manual (SOLiD section). I matched the .fastq file (obtained by solid2fastq, 2 .csfasta files + 2 .qual files) with the references. I got a 3.5 Gb .bmf file. When I started the local alignement I got an error as showed above.

    What am I doing wrong? I hope anybody could help me..

    Many thanks.
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    #2
    What where your match commands?

    Comment

    • mnfuser
      Junior Member
      • Apr 2012
      • 6

      #3
      Originally posted by nilshomer View Post
      What where your match commands?
      i) Indexes creation using 10 masks (as in BFAST manual):

      $BFASTDIR/bfast/bfast index -n 8 -f $REFERENCEDIR/$REFERENCEFILE -m $MASK<1:10> -w 14 -i 7 -A 1

      10 .bif files created successfully (13 Gb each)

      ii) Matching step

      $BFASTDIR/bfast/bfast match -f $REFERENCEDIR/$REFERENCEFILE -A 1 -r $OUTDIR/$READSFILE.fastq > $OUTDIR/$READSFILE.matched.bmf

      REFERENCEFILE=ucsc.hg19.fa
      READSFILE=solid0121_20100616_PEcllSureSelect_CLL_11 (obtained by running sold2fastq with .csfasta F3 and F5 files and 2 related .qual files)

      Anything wrong?

      Thanks

      Comment

      • nilshomer
        Nils Homer
        • Nov 2008
        • 1283

        #4
        I would take a look at the bfast+bwa branch for paired ends. To debug, I would need a small test case.

        Comment

        • mnfuser
          Junior Member
          • Apr 2012
          • 6

          #5
          I'm doing localalign with -U option and it's working. I will take a look at the bwaaln and related PE pipe to compare results.

          I'm also reporting that while testing reads subsets with -s/-e in localalign, I got good results for such debug intervals. Error rised when localalign worked on the entire dataset, at the beginning of the computation (0 reads processed. I tested the interval 1:3000 and it worked).
          I dont want to bother you more.

          Let me know what you think about..


          Thank for your interest

          Cu,
          Marco

          Comment

          • mnfuser
            Junior Member
            • Apr 2012
            • 6

            #6
            The postprocess step, after alignment with -U option (it worked fine, I guess, and outputted a 4.5 Gb .baf file), gave me a segmentation fault error.

            See below err message:

            ************************************************************
            Checking input parameters supplied by the user ...
            Validating fastaFileName /state/partition1/genome/bfast/ucsc.hg19.fasta.
            Validating alignFileName /share/GAMES/data/test20120501/solid0121_20100616_PEcllSureSelect_CLL_11.matched.bmf.aln.baf.
            Input arguments look good!
            ************************************************************
            ************************************************************
            Printing Program Parameters:
            programMode: [ExecuteProgram]
            fastaFileName: /state/partition1/genome/bfast/ucsc.hg19.fasta
            alignFileName: /share/GAMES/data/test20120501/solid0121_20100616_PEcllSureSelect_CLL_11.matched.bmf.aln.baf
            algorithm: [Best Score]
            space: [Color Space]
            strandedness: [Opposite strand]
            positioning: [Read one first]
            pairing: [Paired End]
            avgMismatchQuality: 10
            scoringMatrixFileName: [Not Using]
            randomBest: [Not Using]
            minMappingQuality: -2147483648
            minNormalizedScore: -2147483648
            insertSizeAvg: 0.000000
            insertSizeStdDev: 0.000000
            numThreads: 8
            queueLength: 100000
            outputFormat: [SAM]
            outputID: [Not Using]
            RGFileName: [Not Using]
            baseQualityType: [MAQ-style]
            timing: [Not Using]
            ************************************************************
            ************************************************************
            Reading in reference genome from /state/partition1/genome/bfast/ucsc.hg19.fasta.nt.brg.
            In total read 93 contigs for a total of 3137161264 bases
            ************************************************************
            Postprocessing...
            ************************************************************
            Estimating paired end distance...
            Found only 0 distances to infer the insert size distribution
            ************************************************************

            In function "GetPEDBins": Warning[OutOfRange]. Variable/Value: b->numDistances.
            Message: Not enough distances to infer insert size distribution.
            ***** Warning *****
            ************************************************************
            /opt/torque/mom_priv/jobs/878.deepseq.unife.it.SC: line 30: 28843 Segmentation fault $BFASTDIR/bfast/bfast postprocess -n 8 -f $REFERENCEDIR/$REFERENCEFILE -i $OUTDIR/$ALNFILE.baf -A 1 -Y 0 > $OUTDIR/$ALNFILE.sam

            Comment

            • nilshomer
              Nils Homer
              • Nov 2008
              • 1283

              #7
              Very interesting, can you create a small test case to debug?

              Comment

              • mnfuser
                Junior Member
                • Apr 2012
                • 6

                #8
                Hi, I ran bfast+BWA on centOS cluster, segmentation foult again (in index creation). It ran fine on my debian laptop. Any issues related to centOS?

                Thank

                Comment

                • nilshomer
                  Nils Homer
                  • Nov 2008
                  • 1283

                  #9
                  I am sorry, there is not enough information to debug.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM
                  • SEQadmin2
                    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                    by SEQadmin2


                    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                    Introduction

                    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                    05-22-2026, 06:42 AM
                  • SEQadmin2
                    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                    by SEQadmin2

                    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                    05-06-2026, 09:04 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Today, 08:59 AM
                  0 responses
                  9 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  21 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  17 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-28-2026, 11:40 AM
                  0 responses
                  30 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...