You may find "results.tab" in your SOLiD SAGE output folder. I simply count the number of reads ID(4th column) shown in that file. If a read appears more than once, I only count it as one. Using that number divided by the total number of reads, which you may get it from the csfasta files. Hope this helps.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
It's dangerous to count tags only once randomly. I postprocess the results.tab file and associate the refseq ID from a mySQL table which lnks genbank ID with RefSeq. Single RefSeqs may have many tags associated, but a tag which maps in multiple targets must be discarded, how do you know which was the correct father sequence ?
regards
Alessandro
Comment
-
Hi
I'm a newbie! trying to find DEG though RefSeq data from SOLiD platform. I'm trying to do the alignment with the SOLiD SAGE V 1.1.0 but it just doesn't work well. I get huge amount of messages that:
"Use of uninitialized value in hash element at solid.sage.v110.pl line 1373, <F> line 1187473."
I'm using the default options, with correct input data(also tried the official sample data), but I'm not sure about the reference genome. My data is from human so I'm using the human genome from here
chromFa.tar.gz - The assembly sequence in one file per chromosome.
the reference is around 4 GB already, I donno if I was supposed to use something more compact.
could anyone help me with this?
thanks
Pej
Comment
-
SAGE HSsreference genome format
Originally posted by Pejman View PostHi
I'm a newbie! trying to find DEG though RefSeq data from SOLiD platform. I'm trying to do the alignment with the SOLiD SAGE V 1.1.0 but it just doesn't work well. I get huge amount of messages that:
"Use of uninitialized value in hash element at solid.sage.v110.pl line 1373, <F> line 1187473."
I'm using the default options, with correct input data(also tried the official sample data), but I'm not sure about the reference genome. My data is from human so I'm using the human genome from here
chromFa.tar.gz - The assembly sequence in one file per chromosome.
the reference is around 4 GB already, I donno if I was supposed to use something more compact.
could anyone help me with this?
thanks
Pej
chromosome headers have this format:
>gi|224384759|gb|CM000672.1| Homo sapiens chromosome 10, GRC primary reference assembly
I think this could cause your problem
Kind regards
Alessandro
Comment
-
Hi
I was wondering if someone could please comment on how the tags are generated by SOLiD SAGE? I have seen their protocol at: http://tools.invitrogen.com/content/...D_SAGE_man.pdf
where page 2 explains how the tags are generated for sequncing. What I understood and later found (in actual mappings) was that in relation to a cDNA sequence (from RefSeq for example) the tags generated by SOLiD-SAGE were reverse complemented!
Generally, one finds the tags starting with T and ending in 131 representing CATG (there are few more bases following CATG but these are not used in mappings so I am not counting those). SOLiD-SAGE software tool v1.10 maps these tags to the reverse compliment of Refseqs. We contacted local LifeTech support person and were told that the tags are just sequenced from 3' to 5' direction so shouldnt map to reverse compliments in RefSeq. This has confused me!
Has anyone tried to map the tags using any other tool (like BWA or PerM) except SOLiD-SAGE tool? If you did what was your experience with regard to orientation of the tags? I ask this as the SOLiD-SAGE tool takes whole RefSeqs fasta file and creates virtual tags from these, while with something like PerM we make the virtual tags ourselves before mapping, so creating the right tags will be the first important step before mapping. Would appreciate if someone could comment please.
Cheers
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
62 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Comment