Originally posted by ssully
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I have removed the linkers and split the 454 mate pair reads with sff_extract; I have them now as (after deinterlacing) a pair of fastq files (454_1.fastq and 454_2.fastq) containing reads _1 and _2 only, respectively. In each case Read_1 represents the pre-linker and Read_2 represents the post-linker part of the original read, both in forward orientation:
schematic of original read
Code:================================^^^^^^^^^^^^^^^======================= 454_1---> linker 454_2--->
because when assembled, they should be ordered _2 --> _1 (again both in forward i.e., 5'--3' orientation), with the library insert size distance between them
schematic of assembled reads
Code:454_2 454_1 --------> (~3kb) --------> ==================================================================
would a YAML readset section like this work?
{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
]
},
or should it be
{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
]
},
?
(I adapted these views from http://seqanswers.com/forums/showpos...85&postcount=2 )Last edited by ssully; 12-02-2014, 07:10 PM.
Comment
-
Originally posted by ssully View PostI have removed the linkers and split the 454 mate pair reads with sff_extract; I have them now as (after deinterlacing) a pair of fastq files (454_1.fastq and 454_2.fastq) containing reads _1 and _2 only, respectively. In each case Read_1 represents the pre-linker and Read_2 represents the post-linker part of the original read, both in forward orientation:
schematic of original read
Code:================================^^^^^^^^^^^^^^^======================= 454_1---> linker 454_2--->
because when assembled, they should be ordered _2 --> _1 (again both in forward i.e., 5'--3' orientation), with the library insert size distance between them
schematic of assembled reads
Code:454_2 454_1 --------> (~3kb) --------> ==================================================================
would a YAML readset section like this work?
{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
]
},
or should it be
{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
]
},
?
(I adapted these views from http://seqanswers.com/forums/showpos...85&postcount=2 )
Anyway, you can simply feed the data to SPAdes and check whether it inferred the insert size distribution properly.
Comment
-
I don't know; the second variant seems to be saying to me , 'the reads from the right side of the library read (post-linker, 454_2.fastq) belong at the right end of the genome fragment' -- which would be incorrect.
For me it really comes down to what 'right reads' and 'left reads' means in the YAML specification:
e.g. does 'right reads' refer to a read's position in the 454 mate pair library read (i.e., right side/post-linker in the 454 read, but maps to the left end of the genomic fragment) or with respect to the genome (i.e., maps to the right end of the genomic fragment...but comes from the left side/pre-linker half of the 454 read)
(it's also unusual to me that 'right read' is specified before 'left read' in the YAML, for both paired end and mate pair types, given that sequences are typically read by humans from left to right, 5' to 3'... is there a particular reason for that?)
But anyway I can try inputting it both ways, in two runs, and see which one assembles the 454 mate pairs correctly.Last edited by ssully; 12-03-2014, 01:12 PM.
Comment
-
I worked out the correct orientation and order of 454 paired reads input for SPAdes, and have corrected the reads with --iontorrent option (ionhammer). Btu now I have questions regarding ionhammer error correction -- does it pay any attention to fastq quality scores?
here is an original paired-end sff read (converted to fastq -- note 'sanger style' quality scores, and lower case for low-quality bases). I have underlined that the part that constitutes the 'post linker' read.
sff to fastq
@GIDY76W02G4JWL
Code:tcagTTATTGATCAGTATTAGAATGAGGCCTATTAATAGCCAATTATCACATTTTGGATCTATTTTGTATCGATGATATCATTTATCGATAATCATCATAGTTATTTCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA[U]TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCTGGTTTTAAGctgagactgccaaggcacacaggggatagg[/U]n + III;;;;BIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII;:8599>>@:9////92EBEDDDGIIIIIIFEC?:??IIHHHEIIIIIIIIIICCECC:??C==?EEEIEGHHIIIIIIIIGHFHGIIIIC?==CIIIIEEAAAEE>8333C444IIICIIIIGGGGGIIIGGGGIIIIIIIIIIIIA>999=499----./25:===;=@A>>::::EEIIII@@BAGGGII!
sffToCA
Code:@GIDY76W02G4JWLb clr=0,95 clv=1,0 max=1,0 tnt=1,0 rnd=t TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCTGGTTTTAAGCTGAGACTGCCAAGGCACACAGGGGATAGG + IEEAAAEE>8333C444IIICIIIIGGGGGIIIGGGGIIIIIIIIIIIIA>999=499----./25:===;=@A>>::::EEIIII@@BAGGGII
here was my spades command
Code:spades.py --only-error-correction --iontorrent --dataset 454_4.yaml -t 8 --sc -k 21,33,55 --disable-gzip-output -o sff2ca_spades_corrected
and here is the output of ionhammer for the above read
Code:>GIDY76W02G4JWLb TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCT[U]G[/U]TTTTAAGCTGAGACTGCCAAGGCACACAGGGGATAGG
So, I'm not clear on what ionhammer should be doing; it appears I need to quality-trim my 454 reads *before* running them through ionhammer...*OR* I need to preserve the lower-case base formatting in the input file?Last edited by ssully; 12-06-2014, 07:50 AM.
Comment
-
Originally posted by ssully View PostSo, I'm not clear on what ionhammer should be doing; it appears I need to quality-trim my 454 reads *before* running them through ionhammer...*OR* I need to preserve the lower-case base formatting in the input file?
Comment
Latest Articles
Collapse
-
by seqadmin
Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.
Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...-
Channel: Articles
05-24-2024, 01:16 PM -
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 06:55 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Today, 06:55 AM
|
||
Started by seqadmin, 05-30-2024, 03:16 PM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
05-30-2024, 03:16 PM
|
||
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability
by seqadmin
Started by seqadmin, 05-29-2024, 01:32 PM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
05-29-2024, 01:32 PM
|
||
Started by seqadmin, 05-24-2024, 07:15 AM
|
0 responses
215 views
0 likes
|
Last Post
by seqadmin
05-24-2024, 07:15 AM
|
Comment