Header Leaderboard Ad

Collapse

What are these BC files for? Should I use them in alignment?

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What are these BC files for? Should I use them in alignment?

    Hi, folks. I've started working with some old SOLiD single-ended RNA-seq reads, from 2010 and 2011. I'm using novoalignCS and have the quality files, csfasta files, and an additional fasta-like file with "BC" in the filename. Here's an example of the filenames:

    S1001938CIS_7/primary.20100713155754122/reads/
    solid0015_20100706_S1001938CIS_BC_bcSample1_F3.csfasta
    solid0015_20100706_S1001938CIS_BC_bcSample1_F3.stats
    solid0015_20100706_S1001938CIS_BC_bcSample1_F3_QV.qual
    S1001938CIS_7/primary.20100707172116543/reads/
    solid0015_20100706_S1001938CIS_BC_bcSample1_BC.csfasta

    The F3.csfasta and F3_QV.qual files are as expected, and work fine with novoalignCS.

    The BC.csfasta files have data as follows and I'm completely mystified as to what they are:

    # Wed Jul 7 11:02:39 2010 /share/apps/corona/bin/filter_fasta.pl --output=/data/results/solid0015/solid0015_20100706_S1001938CIS_BC/bcSample1/results.F1B1/primary.20100707172116543 --name=solid0015_20100706_S1001938CIS_BC_bcSample1 --tag=BC --minlength=5 --mincalls=25 --prefix=G /data/results/solid0015/solid0015_20100706_S1001938CIS_BC/bcSample1/jobs/postPrimerSetPrimary.1505/rawseq
    # Cwd: /home/pipeline
    # Title: solid0015_20100706_S1001938CIS_BC_bcSample1
    # Library:S1001938CIS_7:00313
    >1_223_2_BC 0
    G00313
    >1_238_37_BC 0
    G00313
    >1_240_14_BC 0
    G00313

    Anyone know? Should I be using these BC files in some way? They have extremely little information content.
    Sam Hokin
    Computational Scientist, Carnegie and NCGR

  • #2
    Well, the main thing is your csfasta and qual files work fine.

    Could BC be barcode ? They seemt to be short, as in barcodes, and the format seems to be the csfasta format.

    Not sure I ever saw these in my days of SOLiD adventures, which ended in 2012 (thank god).

    Comment


    • #3
      Hey, the data turned out OK and was useful! But yeah, I had to dust off the hard drive the reads were on, it'd been sitting on a shelf for eight years or so.
      Sam Hokin
      Computational Scientist, Carnegie and NCGR

      Comment


      • #4
        wow that is some really good work that you've been doing there. I had been closely associated with some projects on Encodeproject on RNA sequence measures.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
          by seqadmin


          ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

          01-24-2023, 01:19 PM
        • seqadmin
          Introduction to Single-Cell Sequencing
          by seqadmin
          Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

          The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
          ...
          01-09-2023, 03:10 PM

        ad_right_rmr

        Collapse
        Working...
        X