Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A strange size difference of fastq file

    Hi, currently I'm working on a Illumina sequencing data in fastq format. I downloaded it from public available database (TCGA) and it was zipped. After unzip and trimming the size of the file is about 16G. Interesting thing comes. After I copied this file to another partition, the size of the new copy became 7.6G. The number of lines in the files, the number of reads and their length distribution are the same in the two files. So I guess the two files have the same content, the new copy is not truncated.

    Moreover, when I run Tophat2/Cufflinks with 16G copy, it takes much longer time to finish and the the result looks strange. But it is quite normal with the 7.6G copy. This might not be a bioinformatics question but it's quite interesting. What happened to the file? What might be those additional size in the file?

    Thanks a lot.

  • #2
    I can't tell... But one thing you can try to get some hints is:

    Code:
    cat -vet my_strange_reads.fq | less
    This is will show you non-printable characters in the file. In a typical fastq file you shouldn't see anything new in addition to the usual alphanumeric characters and some metacharacters in the read names.

    In practice, I would download again the file just to make sure something got corrupted in the process.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM
    • seqadmin
      Techniques and Challenges in Conservation Genomics
      by seqadmin



      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 06:37 PM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Yesterday, 06:07 PM
    0 responses
    10 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-22-2024, 10:03 AM
    0 responses
    52 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-21-2024, 07:32 AM
    0 responses
    68 views
    0 likes
    Last Post seqadmin  
    Working...
    X