Hi All,
I am a total newbies in this field. I have to assemble RNA seq data. Before that I need to trim the sequences. I have got 100bp illumina paired end reads in two files. I also got the adaptors sequences P5 and P7.
5-AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGACGATC-(insert)-ACCTTAAGAGCCCACGGTTCCTTGAGGTCAGTGXXXXXXTAGAGCATACGGCAGAAGACGAAC-3
I tried trim_galore... but still I can see these much of overrepresented sequences in the fastqc report.
ATGACACTCAAACAGGCATGCTCCACGGAATACCATGGAGCGCAAGGTGC 1155666 2.5956349017221085 No Hit
AATGACGCTCGAACAGGCATGCCCCTCGGAATACCAAGGGGCGCAATGTG 225179 0.5057538004361837 No Hit
AAGACACTCAAACAGGCATGCCTCTCGGAATACCAAGAGGCGCAAGGTGC 218636 0.4910581711090531 No Hit
GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAA 119619 0.2686652123616139 Illumina RNA PCR Primer (100% over 50bp)
GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAA 111925 0.251384428005364 Illumina RNA PCR Primer (100% over 50bp)
AAATGACGCTCAAACAGGCATGCCCTTTGGAATACCAAAGGGCGCAATGT 104210 0.2340564774843778 No Hit
ACAAACCCTTGTGTCGAGGGCTGACTTTCAATAGATCGCAGCGAGGGAGC 71881 0.16144528987673504 No Hit
GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAA 46463 0.10435626248303084 Illumina RNA PCR Primer (100% over 50bp)
I wanted to know how i can remove this overrepresented sequences from my data and do i need to remove it all ?
I am a total newbies in this field. I have to assemble RNA seq data. Before that I need to trim the sequences. I have got 100bp illumina paired end reads in two files. I also got the adaptors sequences P5 and P7.
5-AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGACGATC-(insert)-ACCTTAAGAGCCCACGGTTCCTTGAGGTCAGTGXXXXXXTAGAGCATACGGCAGAAGACGAAC-3
I tried trim_galore... but still I can see these much of overrepresented sequences in the fastqc report.
ATGACACTCAAACAGGCATGCTCCACGGAATACCATGGAGCGCAAGGTGC 1155666 2.5956349017221085 No Hit
AATGACGCTCGAACAGGCATGCCCCTCGGAATACCAAGGGGCGCAATGTG 225179 0.5057538004361837 No Hit
AAGACACTCAAACAGGCATGCCTCTCGGAATACCAAGAGGCGCAAGGTGC 218636 0.4910581711090531 No Hit
GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAA 119619 0.2686652123616139 Illumina RNA PCR Primer (100% over 50bp)
GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAA 111925 0.251384428005364 Illumina RNA PCR Primer (100% over 50bp)
AAATGACGCTCAAACAGGCATGCCCTTTGGAATACCAAAGGGCGCAATGT 104210 0.2340564774843778 No Hit
ACAAACCCTTGTGTCGAGGGCTGACTTTCAATAGATCGCAGCGAGGGAGC 71881 0.16144528987673504 No Hit
GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAA 46463 0.10435626248303084 Illumina RNA PCR Primer (100% over 50bp)
I wanted to know how i can remove this overrepresented sequences from my data and do i need to remove it all ?