Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by cliff View Post
    I have tried ill2sanger, but still got the same problem.
    ...
    This problem is exactly the same as what I saw after sol2sanger. And all the other lanes are fine except this one.

    Do you have thoughts on this?
    That is a shame - sometimes I don't want to be right about things:
    Originally posted by maubp View Post
    This probably won't make any difference to the file size oddity. The difference between sol2sanger and ill2sanger is how they map the quality scores.
    My other thought was there was some kind of data corruption in your file(s).
    Originally posted by maubp View Post
    It would also be worth re-downloading the FASTQ files (from your service provider, collaborator - where ever you got them from) just in case there was a corruption on transfer. That could could explain the file size oddity. Its a long shot though.
    You could (as suggested earlier) try another tool for the conversion.

    How is your scripting (e.g. Perl or Python)? What might be worth doing now is some basic validation like checking the read length distribution of both FASTQ files (before and after conversion). For Solexa/Illumina all the reads should be the same length. Also check that the two files have matching paired read names (before and after conversion).
    Last edited by maubp; 12-21-2009, 02:08 PM. Reason: Clarity

    Comment


    • #17
      Hi, maubp

      Thanks very much for your response.

      I just found it is not the conversion problem. It is the original _sequence.txt files's problem. Although read1.txt and read2.txt are in the same size, the numbers of reads in this pair are actually different. read1.txt has more reads than those in read2.txt.

      Should I drop this lane?

      or, can I use BWA to do mapping? It seems like BWA maps read1 and read2 separately and then pairs them.

      Thanks again and Merry Xmas

      Comment


      • #18
        You could try mapping the two files separately, but if there is a corruption in the data, this may still fail (and it won't take advantage of the pairing information). Given read1.txt and read2.txt are in the same size, but have a different number of reads, I'm pretty sure that at least one of the files has been corrupted. If you can't re-download the files, you can probably still fix this - even if you have to throw out some reads. How easy this will be will depend very much on your scripting ability, and what exactly has happened to your file(s).

        Comment


        • #19
          I actually re-downloaded the files. I checked those two sequence.txt files. Those two files look good except in read2, more than 30 reads are missing. Probably those reads didn't pass filtering in Read2.

          which way do you think is better?
          1: get rid of those 30 some reads in Read1 and make Read1 and Read2 both have equal number of reads.
          2: don't use this lane.

          Thanks

          Comment


          • #20
            If the problem is just these 30 reads, then I would just remove the extra ones, so that the two files match up properly (your option 1). After all 30 reads is a tiny fraction of the whole lane - unless you have other reasons to throw out this data (your option 2), I personally would try and salvage most of this lane.

            Comment


            • #21
              Originally posted by dawe View Post
              Interesting... can you tell me your system configuration? (Hardware/software). Also, can you test if the sol2sanger works? ill2sanger is nothing but a different version of sol2sanger so, a segfault should be raised in that case too
              Dawe,

              Apologies for the tardy reply. The problem resolved after restarting the machine. Thanks for your response.

              -Harold

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-25-2024, 11:49 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-24-2024, 08:47 AM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              62 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Working...
              X