Here's my public history https://main.g2.bx.psu.edu/u/davidkim/h/ngsworkshop
I am trying to learn NGS using the public data https://www.ncbi.nlm.nih.gov/geo/que...i?acc=GSE39083
Fastqc on EBI_SRA__SRP014008_File__SRR518493.fastq.gz__1
found TruSeq adapters in the reads.
Sequence Count Percentage Possible Source
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCCATCATAT 347588 3.340743581 TruSeq Adapter, Index 2 (97% over 37bp)
CGGCAGTCCACTCCGGTACGCTATCCCACTACTGCCTACCAC 159445 1.532460443 No Hit (possibly ncRNA? )
I did 'clip sequence' (with default setting and output only non-clipped sequences) and then I got only 83% of reads. But the fastqc showed only 3.3% were the truseq adapter sequences. Why did 'clip sequence' tool discard 17% of reads?
Clipping Adapter GATCGGAAGAGCACACGTCTGAACTCCAGTCACCCATCATAT
Input 10404510 reads.
Output 8683051 reads.
discarded 451106 too-short reads.
discarded 386260 adapter-only reads.
discarded 868903 clipped reads.
Thanks,
I am trying to learn NGS using the public data https://www.ncbi.nlm.nih.gov/geo/que...i?acc=GSE39083
Fastqc on EBI_SRA__SRP014008_File__SRR518493.fastq.gz__1
found TruSeq adapters in the reads.
Sequence Count Percentage Possible Source
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCCATCATAT 347588 3.340743581 TruSeq Adapter, Index 2 (97% over 37bp)
CGGCAGTCCACTCCGGTACGCTATCCCACTACTGCCTACCAC 159445 1.532460443 No Hit (possibly ncRNA? )
I did 'clip sequence' (with default setting and output only non-clipped sequences) and then I got only 83% of reads. But the fastqc showed only 3.3% were the truseq adapter sequences. Why did 'clip sequence' tool discard 17% of reads?
Clipping Adapter GATCGGAAGAGCACACGTCTGAACTCCAGTCACCCATCATAT
Input 10404510 reads.
Output 8683051 reads.
discarded 451106 too-short reads.
discarded 386260 adapter-only reads.
discarded 868903 clipped reads.
Thanks,
Comment