Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • smarkel
    replied
    Thank you for the explanation.

    Leave a comment:


  • nilshomer
    replied
    Originally posted by smarkel View Post
    Thank you. Yes, you're right. Both files are individually sorted. I should have worded my original question differently. I expected xxx_F3.csfasta and xxx_R3.csfasta to have the same number of entries, with the nth entry in the F3 file being the mate of the nth entry in the R3 file. What does it mean for a paired read's mate to not exist?
    It means that given some quality threshold, a proper the mate at that location could not be identified. I use "unpaired' reads without any problem. After >10 slides those extra reads really add up!

    Leave a comment:


  • smarkel
    replied
    Thank you. Yes, you're right. Both files are individually sorted. I should have worded my original question differently. I expected xxx_F3.csfasta and xxx_R3.csfasta to have the same number of entries, with the nth entry in the F3 file being the mate of the nth entry in the R3 file. What does it mean for a paired read's mate to not exist?

    Leave a comment:


  • nilshomer
    replied
    Originally posted by smarkel View Post
    The files I'm using are from AB's site (http://solidsoftwaretools.com/gf/project/ecoli2x50/) and aren't sorted. I looked at Rosalind_20080729_2_Chris5_F3.csfasta.zip and Rosalind_20080729_2_Chris5_R3.csfasta.zip. My understanding is that the sorting happens when the reads are mapped. Maybe the posted files aren't representative.
    I wasn't too descriptive in my explanation.

    Take a look at these reads from real data:
    Code:
    >1_6_55_F3
    T01100000000201002010120012300200011000100.01101131
    >1_6_64_F3
    T01203010110102003000000101100111000010100.01100131
    >1_6_69_F3
    T01031320200103032011110221111112110020111.11111131
    >1_6_97_F3
    I claim they are sorted based on read name. They have the form:
    >%d_%d_%d_F3
    where %d stands for some integer. It is sorted by the right-most integer, then middle integer, then left-most integer. The equivalent read (the mate) in the R3 file will be
    >%d_%d_%d_R3

    The "Rosalind" file follows the same pattern:
    Code:
    >469_26_42_F3
    T12113310031232112221003120021221223320222122212122
    >469_26_379_F3
    T31202223003310000130302323312223212011000010033200
    >469_26_540_F3
    T11012313031030123033113130100223110001231232303210
    >469_26_560_F3

    Leave a comment:


  • smarkel
    replied
    The files I'm using are from AB's site (http://solidsoftwaretools.com/gf/project/ecoli2x50/) and aren't sorted. I looked at Rosalind_20080729_2_Chris5_F3.csfasta.zip and Rosalind_20080729_2_Chris5_R3.csfasta.zip. My understanding is that the sorting happens when the reads are mapped. Maybe the posted files aren't representative.

    Leave a comment:


  • nilshomer
    replied
    Originally posted by smarkel View Post
    I know that the read order of xxx_F3.csfasta and xxx_F3_QV.qual are the same, but I can't find any information that describes how the read order of xxx_F3.csfasta and xxx_R3.csfasta are related. I'd appreciate any pointers to documentation that describe the order relationship.
    In practice, they will be sorted based on read name, with the numbers indicating panel, x-position, y-position. If a read has no mate, then it will only be present in one of the files. See solid2fastq programs/scripts like the ones in BFAST or MAQ that use the above properties.

    Leave a comment:


  • smarkel
    started a topic matching unmapped paired SOLiD reads

    matching unmapped paired SOLiD reads

    I know that the read order of xxx_F3.csfasta and xxx_F3_QV.qual are the same, but I can't find any information that describes how the read order of xxx_F3.csfasta and xxx_R3.csfasta are related. I'd appreciate any pointers to documentation that describe the order relationship.

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-25-2024, 11:49 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
62 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X