Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • blancha
    replied
    Thank you Brian.
    You are correct in pointing out that the only problem with the format is the + sign on every other line.
    The + just corresponds to the paired FASTQ read.
    If this was the only issue I had with Rockhopper, I would be happy.

    The main problem I have is that when I view the alignments in IGV, at least half the reads are mostly composed of mutations relative to the reference genome.
    I've tried all the different settings, fr, ff, rf, and rr.
    I cannot figure out why Rockhopper insists on aligning reads in what appears to be the wrong location.

    I think I'll just give up on the software, even if it appears to be widely used in respected publications for E. coli RNA-Seq analysis.

    Leave a comment:


  • Brian Bushnell
    replied
    It mostly looks like a normal sam file; the specification is here: https://samtools.github.io/hts-specs/SAMv1.pdf

    However, the second line has "+" for the read name, which is odd to say the least. Can you run head on the input fastq file to show the first 8 lines?

    Edit - looking at the attachment, it appears that either you have an odd fastq file with read2 always named "+" or that Rockhopper has a bug causing it to incorrectly report the read name.

    Leave a comment:


  • Unfamiliar SAM file format outputted by Rockhopper program

    Hi,

    I'm using Rockhopper to analyze E. coli RNA-Seq data.
    rockhopper, rna-seq, rnaseq, analysis, bacteria, bacterial, bioinformatics

    I'm not familiar with the SAM format outputted by Rockhopper.
    Has anyone seen this format before, or have any ideas on how to convert it the traditional format, which I could then view in IGV or on the UCSC Genome Browser? I'm quite comfortable with both Python and R, but I really don't understand the current format, so I'm unable to convert it.
    The data is paired-end.

    Here is the first fourteen lines from the SAM file.
    I've put more lines in the attached file.

    Code:
    [blancha@lg-1r14-n04 samFiles]$ samtools view -h -f 2 IK_21C-EM9-1_R1.sam | more
    @HD	VN:1.0	SO:unsorted
    @SQ	SN:gi|556503834|ref|NC_000913.3|	LN:4641652	SP:Escherichia coli str. K-12 substr. MG1655
    @PG	ID:Rockhopper	PN:Rockhopper	VN:2.03
    D69F08P1:403:C6Y8VACXX:5:1101:1436:2236 1:N:0:AGTCAAC	67	gi|556503834|ref|NC_000913.3|	2527763	255	50M	=	2527927	213	TGGCAAATGGCATCCCGATGGCAAACATTCTGTTCCCCACATCGGTGATC	BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIII
    +	131	gi|556503834|ref|NC_000913.3|	2527927	255	49M	=	2527763	-213	CGCAACTGGTCCAGCCCCTGAAGCGTCCGCTTTAAGCTTTATCGGCGCT	BBBFFFFFFFFFFIIIIIIIFIIIIFIIIIIIIIIIIIIIIIIIIIIFF
    D69F08P1:403:C6Y8VACXX:5:1101:1606:2216 1:N:0:AGTCAAC	67	gi|556503834|ref|NC_000913.3|	3441734	255	50M	=	3441811	126	CGACAACCGTTATGAGGGATCGGAGTCACATCAGTAATGTTAGTGATGCG	BBBFBFF<F0<FFIIIIIF7FFFFFIIIIIIFFFBFFFF<FFFB7B7B<F
    +	131	gi|556503834|ref|NC_000913.3|	3441811	255	49M	=	3441734	-126	GAATCTGGAAGTTATGGTTAAAGGTCCGGGTCCAGGCCGCGAAACTACT	BBBBFBFFFBF<FFFIIB<FFFIBFFFFF7BBFFFFFIFFIFF<FFFFB
    D69F08P1:403:C6Y8VACXX:5:1101:1955:2210 1:N:0:AGTCAAC	67	gi|556503834|ref|NC_000913.3|	3471221	255	50M	=	3471324	152	CCCGTACGGTGGTGATTGCAGCGGTCAGAGTAGTTTTACCGTGGTCAACG	BBBFFFFFFFFFFFFIIIIIIIIIIIIIIIFFFIIIIFFIIIIIIIIIII
    +	131	gi|556503834|ref|NC_000913.3|	3471324	255	49M	=	3471221	-152	GCTCTCTCCTGAAGGGGAGAGCACTATAGTAAGGAATATAGCCGTGTCT	BBBFFFFFFFFFFIIIIIIIIIIIIIIFIFFIIIIIIIIIIIIIFIIII
    D69F08P1:403:C6Y8VACXX:5:1101:2133:2203 1:N:0:AGTCAAC	115	gi|556503834|ref|NC_000913.3|	1719838	255	50M	=	1719872	83	AAGAGACAGACCTACCATTGAAACAACCAATACGCGTTTAATCATTGAAA	BBBFFFFFFFFFFIIIIIIFFIIIFIIIIIIFFFBFBFFFIIIFFFFFFB
    +	179	gi|556503834|ref|NC_000913.3|	1719872	255	49M	=	1719838	-83	GCTTGCGTGGCGTTTCATGGTGAACAGGAGATTTTTCAATGATTAAACG	BBBFFFFFFFFFFFFFIIIIBFBFFIIIFFBFFFIIIIBFFBFIFBBFB
    D69F08P1:403:C6Y8VACXX:5:1101:1916:2222 1:N:0:AGTCAAC	67	gi|556503834|ref|NC_000913.3|	3444439	255	50M	=	3444490	100	CCCACGACCACCGGTTTTACCGAGGCCAGAACCGATACCACGACCCAGGC	BBBFFFFFFFFFFFFFFFFIFFII<BBFFFFIIFFIF<<<BF<BBFBF7B
    +	131	gi|556503834|ref|NC_000913.3|	3444490	255	49M	=	3444439	-100	TGCGTTTAAATACTCTGTCTCCGGCCGAAGGCTCCAAAAAGGCGGGTAA	BB<FFFFFFFFFFFBFFBBBFBFFFFFFFB7BFFIBFFFBFB<BBB0<B
    D69F08P1:403:C6Y8VACXX:5:1101:2117:2249 1:N:0:AGTCAAC	115	gi|556503834|ref|NC_000913.3|	639393	255	50M	=	639501	157	GGCGACGCCAACGCCGCTATGGCGTGAAAGACGAAGGAAATTTAGATTTT	<BBFBFFFBBFBFFFIFFBFFIIIIIFBFFIIIIF7<BF<BBBBBBBBB<
    +	179	gi|556503834|ref|NC_000913.3|	639501	255	49M	=	639393	-157	GTAAAATCAAAGCAGCACAGTACGTAGCTTCTCACCCAGGTGAAGTTTG	B<BFFFFFFFFFFFBFFFFBBFFFFFFIIIFFFIFFBFFFFIBFFBFFF
    Thank you for your help.
    Attached Files
    Last edited by blancha; 07-09-2015, 04:11 PM. Reason: Put lines from SAM file in Code box

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 06-14-2024, 07:24 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-13-2024, 08:58 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-12-2024, 02:20 PM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-07-2024, 06:58 AM
0 responses
186 views
0 likes
Last Post seqadmin  
Working...
X