Thank you Brian.
You are correct in pointing out that the only problem with the format is the + sign on every other line.
The + just corresponds to the paired FASTQ read.
If this was the only issue I had with Rockhopper, I would be happy.
The main problem I have is that when I view the alignments in IGV, at least half the reads are mostly composed of mutations relative to the reference genome.
I've tried all the different settings, fr, ff, rf, and rr.
I cannot figure out why Rockhopper insists on aligning reads in what appears to be the wrong location.
I think I'll just give up on the software, even if it appears to be widely used in respected publications for E. coli RNA-Seq analysis.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
It mostly looks like a normal sam file; the specification is here: https://samtools.github.io/hts-specs/SAMv1.pdf
However, the second line has "+" for the read name, which is odd to say the least. Can you run head on the input fastq file to show the first 8 lines?
Edit - looking at the attachment, it appears that either you have an odd fastq file with read2 always named "+" or that Rockhopper has a bug causing it to incorrectly report the read name.
Leave a comment:
-
Unfamiliar SAM file format outputted by Rockhopper program
Hi,
I'm using Rockhopper to analyze E. coli RNA-Seq data.
I'm not familiar with the SAM format outputted by Rockhopper.
Has anyone seen this format before, or have any ideas on how to convert it the traditional format, which I could then view in IGV or on the UCSC Genome Browser? I'm quite comfortable with both Python and R, but I really don't understand the current format, so I'm unable to convert it.
The data is paired-end.
Here is the first fourteen lines from the SAM file.
I've put more lines in the attached file.
Code:[blancha@lg-1r14-n04 samFiles]$ samtools view -h -f 2 IK_21C-EM9-1_R1.sam | more @HD VN:1.0 SO:unsorted @SQ SN:gi|556503834|ref|NC_000913.3| LN:4641652 SP:Escherichia coli str. K-12 substr. MG1655 @PG ID:Rockhopper PN:Rockhopper VN:2.03 D69F08P1:403:C6Y8VACXX:5:1101:1436:2236 1:N:0:AGTCAAC 67 gi|556503834|ref|NC_000913.3| 2527763 255 50M = 2527927 213 TGGCAAATGGCATCCCGATGGCAAACATTCTGTTCCCCACATCGGTGATC BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIII + 131 gi|556503834|ref|NC_000913.3| 2527927 255 49M = 2527763 -213 CGCAACTGGTCCAGCCCCTGAAGCGTCCGCTTTAAGCTTTATCGGCGCT BBBFFFFFFFFFFIIIIIIIFIIIIFIIIIIIIIIIIIIIIIIIIIIFF D69F08P1:403:C6Y8VACXX:5:1101:1606:2216 1:N:0:AGTCAAC 67 gi|556503834|ref|NC_000913.3| 3441734 255 50M = 3441811 126 CGACAACCGTTATGAGGGATCGGAGTCACATCAGTAATGTTAGTGATGCG BBBFBFF<F0<FFIIIIIF7FFFFFIIIIIIFFFBFFFF<FFFB7B7B<F + 131 gi|556503834|ref|NC_000913.3| 3441811 255 49M = 3441734 -126 GAATCTGGAAGTTATGGTTAAAGGTCCGGGTCCAGGCCGCGAAACTACT BBBBFBFFFBF<FFFIIB<FFFIBFFFFF7BBFFFFFIFFIFF<FFFFB D69F08P1:403:C6Y8VACXX:5:1101:1955:2210 1:N:0:AGTCAAC 67 gi|556503834|ref|NC_000913.3| 3471221 255 50M = 3471324 152 CCCGTACGGTGGTGATTGCAGCGGTCAGAGTAGTTTTACCGTGGTCAACG BBBFFFFFFFFFFFFIIIIIIIIIIIIIIIFFFIIIIFFIIIIIIIIIII + 131 gi|556503834|ref|NC_000913.3| 3471324 255 49M = 3471221 -152 GCTCTCTCCTGAAGGGGAGAGCACTATAGTAAGGAATATAGCCGTGTCT BBBFFFFFFFFFFIIIIIIIIIIIIIIFIFFIIIIIIIIIIIIIFIIII D69F08P1:403:C6Y8VACXX:5:1101:2133:2203 1:N:0:AGTCAAC 115 gi|556503834|ref|NC_000913.3| 1719838 255 50M = 1719872 83 AAGAGACAGACCTACCATTGAAACAACCAATACGCGTTTAATCATTGAAA BBBFFFFFFFFFFIIIIIIFFIIIFIIIIIIFFFBFBFFFIIIFFFFFFB + 179 gi|556503834|ref|NC_000913.3| 1719872 255 49M = 1719838 -83 GCTTGCGTGGCGTTTCATGGTGAACAGGAGATTTTTCAATGATTAAACG BBBFFFFFFFFFFFFFIIIIBFBFFIIIFFBFFFIIIIBFFBFIFBBFB D69F08P1:403:C6Y8VACXX:5:1101:1916:2222 1:N:0:AGTCAAC 67 gi|556503834|ref|NC_000913.3| 3444439 255 50M = 3444490 100 CCCACGACCACCGGTTTTACCGAGGCCAGAACCGATACCACGACCCAGGC BBBFFFFFFFFFFFFFFFFIFFII<BBFFFFIIFFIF<<<BF<BBFBF7B + 131 gi|556503834|ref|NC_000913.3| 3444490 255 49M = 3444439 -100 TGCGTTTAAATACTCTGTCTCCGGCCGAAGGCTCCAAAAAGGCGGGTAA BB<FFFFFFFFFFFBFFBBBFBFFFFFFFB7BFFIBFFFBFB<BBB0<B D69F08P1:403:C6Y8VACXX:5:1101:2117:2249 1:N:0:AGTCAAC 115 gi|556503834|ref|NC_000913.3| 639393 255 50M = 639501 157 GGCGACGCCAACGCCGCTATGGCGTGAAAGACGAAGGAAATTTAGATTTT <BBFBFFFBBFBFFFIFFBFFIIIIIFBFFIIIIF7<BF<BBBBBBBBB< + 179 gi|556503834|ref|NC_000913.3| 639501 255 49M = 639393 -157 GTAAAATCAAAGCAGCACAGTACGTAGCTTCTCACCCAGGTGAAGTTTG B<BFFFFFFFFFFFBFFFFBBFFFFFFIIIFFFIFFBFFFFIBFFBFFF
Attached FilesTags: None
Latest Articles
Collapse
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
09-30-2024, 08:33 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: