Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gringer
    replied
    Never convert colour space sequences to standard base space sequences prior to alignment. While you definitely will prefer the base-space representation of alignment, it's going to result in a lot of errors that aren't present in a colour space alignment.

    To see why this is a bad idea, see my previous verbose rants.

    Leave a comment:


  • Chipper
    replied
    solid reads had a high raw error rate. bowtie with default settings is not ideal for 75 bp reads so 13 % seems about right. bfast will do a better job but start with trimmed reads and change parameters v and n.

    Leave a comment:


  • mastal
    replied
    Maybe there is something wrong with the way fastq-dump converted the data.

    You could try aligning with BFAST, which was specifically written to deal with SOLID data.

    Leave a comment:


  • znasim09
    replied
    Hey mastal,
    I used this command:
    bowtie -f -C a_thaliana test_F3.csfasta -Q test_F3_QV.qual > out_F3.sam

    But still I am getting the same results (Only 13% mapping)

    # reads processed: 6221913
    # reads with at least one reported alignment: 855943 (13.76%)
    # reads that failed to align: 5365970 (86.24%)
    Reported 855943 alignments to 1 output stream(s)

    Leave a comment:


  • mastal
    replied
    Put the -Q immediately before the name of the F3_QV.qual file.

    Leave a comment:


  • znasim09
    replied
    I tried to add the qual file by using this command:
    perl bowtie -f -C -Q test_F3.csfasta a_thaliana test_F3_QV.qual > out_F3.sam

    But it gave the following info:

    Warning: could not parse quality line:
    111>SRR309164.1 1:1086:19215
    T013213212300222033031011....12...10...............................................
    >libc++abi.dylib: terminating with uncaught exception of type int
    Abort trap: 6

    Leave a comment:


  • mastal
    replied
    when you do the alignment you also need to use the base qualities file, so add to your command -Q Sample_F3_QV.qual.

    Leave a comment:


  • znasim09
    replied
    Hey mastal,

    I color indexed the reference genome with the command

    perl bowtie-build --wrapper basic-0 -C /Users/znasim09/Documents/Perl_packages/bowtie-1.1.2/genomes/Ath_reference.fa output.ebwt

    It generated 6 files (as I was expecting).
    Then i aligned the csfasta file using this command

    bowtie -f -C -S a_thaliana test_F3.csfasta > test_F3.sam

    It runs, and gives this info

    # reads processed: 6221913
    # reads with at least one reported alignment: 823401 (13.23%)
    # reads that failed to align: 5398512 (86.77%)
    Reported 823401 alignments to 1 output stream(s)

    What can be the possible reasons for such high percentage of un-aligned reads?
    Plus, the out.sam has zero kb size. I dont know why

    Leave a comment:


  • mastal
    replied
    See the list of options in the Bowtie manual, particularly options -C, and -Q, and note that you have to build a colorspace Bowtie index.

    See the 'Getting Started' section of the Bowtie webpages, but because you have colorspace reads you need to build a colorspace index,
    and also use the options -C and -Q that indicate you have colorspace reads and your base quality values are in a separate file.

    There are 2 steps to running bowtie. 1. make an index of your genome 2. run the alignment

    To run the bowtie alignment the basics are:1. you have to specify the path to the files with the bowtie index 2. you have to specify the path to the files with the reads (and the base qualities) you want to align.
    Last edited by mastal; 04-07-2016, 03:24 AM.

    Leave a comment:


  • znasim09
    replied
    @Mastal
    I read colorspace alignment section quite a few times but couldn't understand it properly . The data I want to analyze is single end.
    I downloaded the SRA file and converted it to csfasta and qual using SRAtoolkit (abi-dump). That's why I asked if anyone can refer me to a video tutorial or can modify the command.

    Leave a comment:


  • mastal
    replied
    Do you have single end or paired-end reads?

    Have you seen the section of the Bowtie manual about aligning colorspace reads?



    If you have paired end _F3.csfasta (forward reads) correspond to -1, or the first reads of a pair, and _R3.csfasta (reverse reads) correspond to -2, the second reads of a pair.

    Leave a comment:


  • znasim09
    replied
    @dpryan
    Thanks for the reply.
    I indexed Arabidopsis genome using the script "make_a_thaliana_tair,sh"
    But, There are plenty of options that I couldn't understand (as I am a beginner). Can you please modify this command

    bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]

    for an example data, e.g. Sample_F3.csfasta and Sample_F3_QV.qual ??

    Leave a comment:


  • dpryan
    replied
    Make your life easier and just use an aligner (e.g., bowtie) that can directly handle these types of files.

    Leave a comment:


  • Any video tutorial for alignment of csfasta and qual files?? Or its fastq conversion?

    Hello everyone,

    For the last few days I have been searching different forums to understand converting csfasta and qual files to fastq and/or to align the files. But due to limited (or I can say no knowledge) of scripts etc, I was unable to do so.

    That's why I would like to watch a video tutorial doing these stuff. Is there any?? I want to convert/align Arabidopsis 1001 genomes data.

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 11:49 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X