Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • nike00
    replied
    Originally posted by Brian Bushnell View Post
    If you download the BBMap package, the adapters are in the resources directory - nextera.fa.gz, truseq.fa.gz, and truseq_rna.fa.gz. You can use all of them with the flag "ref=nextera.fa.gz,truseq.fa.gz,truseq_rna.fa.gz" (with the appropriate paths).
    Thank you very much!

    Leave a comment:


  • gauravdube
    replied
    Hi Gabriel,

    Thank you so much. It worked for me.

    Originally posted by gab0 View Post
    Hi gauravdube:

    I found and used tools from the BBMap package. Brian helped me out guiding me hot to use the bbduk tool.

    I used the following command line: bbduk.sh -Xmx4g -in=(file).fastq.gz -in2=(file).fastq.gz ref=adapters.fa -out=out1.fastq -out2=out2.fastq

    Adapters file has all the adapters that I could find for Illumina platforms, including the control sequences from the libraries, in fasta format. That worked for me, hopefully will work for you too!

    Best regards,

    Gabriel

    Leave a comment:


  • Brian Bushnell
    replied
    Originally posted by nike00 View Post
    Dear Gabriel,

    very interesting post. I would like to know if you have a list of the Illumina adapters and the control sequences as well, to use as adapters.fa file. I cannot find them anywhere.

    Thanks a lot,
    nike00
    If you download the BBMap package, the adapters are in the resources directory - nextera.fa.gz, truseq.fa.gz, and truseq_rna.fa.gz. You can use all of them with the flag "ref=nextera.fa.gz,truseq.fa.gz,truseq_rna.fa.gz" (with the appropriate paths).

    Leave a comment:


  • NextGenSeq
    replied
    It looks like Nextera bias to me.

    Leave a comment:


  • nike00
    replied
    Originally posted by gab0 View Post
    Hi gauravdube:

    I found and used tools from the BBMap package. Brian helped me out guiding me hot to use the bbduk tool.

    I used the following command line: bbduk.sh -Xmx4g -in=(file).fastq.gz -in2=(file).fastq.gz ref=adapters.fa -out=out1.fastq -out2=out2.fastq

    Adapters file has all the adapters that I could find for Illumina platforms, including the control sequences from the libraries, in fasta format. That worked for me, hopefully will work for you too!

    Best regards,

    Gabriel
    Dear Gabriel,

    very interesting post. I would like to know if you have a list of the Illumina adapters and the control sequences as well, to use as adapters.fa file. I cannot find them anywhere.

    Thanks a lot,
    nike00

    Leave a comment:


  • gab0
    replied
    Originally posted by gauravdube View Post
    Hi gab0,

    I am facing exactly the same issue of k-mer content. Hence didn't created a different thread when i encountered yours. My question to you is: what is the tool you used to retain the valid duplicate reads and remove only the control reads. Thanks in advance.
    Hi gauravdube:

    I found and used tools from the BBMap package. Brian helped me out guiding me hot to use the bbduk tool.

    I used the following command line: bbduk.sh -Xmx4g -in=(file).fastq.gz -in2=(file).fastq.gz ref=adapters.fa -out=out1.fastq -out2=out2.fastq

    Adapters file has all the adapters that I could find for Illumina platforms, including the control sequences from the libraries, in fasta format. That worked for me, hopefully will work for you too!

    Best regards,

    Gabriel

    Leave a comment:


  • gauravdube
    replied
    Hi gab0,

    I am facing exactly the same issue of k-mer content. Hence didn't created a different thread when i encountered yours. My question to you is: what is the tool you used to retain the valid duplicate reads and remove only the control reads. Thanks in advance.

    Leave a comment:


  • gab0
    replied
    Originally posted by nucacidhunter View Post
    Apart from Kmer content every parameter looks fine in FastQC report. The number of over-represented Kmers is low (although it is unusual to see in balanced genomes) and I do not think it should be of any concern. The over-represented Kmer could be from duplicate reads (there is a small bump in %total sequences in duplication plot over >10) and it can be checked by removing duplicates and running FastQC again or it could be result of bias in at least one step of library prep due to AT rich nature of genome. Whether duplicates should be removed or not, I think it depends on downstream application and I will let bioinformatician to comment on it.
    Hi

    thanks for your help! So apart from the Kmer problem, the files look ok for downstream analysis.

    Well, I've found and fixed (partially) the kmer problem, so in here I'll write out how I solved this out:

    When checking the files with FastQC V0.11.2, I saw this strange kmer pattern. When checking the Kmers, I figured out that they were displaced by 1bp, so I started to assembly (just by eye) the Kmer sequence.Then, looking the Kmer pattern with grep, I found that there were some repeated sequences/reads, like this one:

    "ACTAGTATGGCCCGGGGGATCCTACGTTCCAAATGCAGCGAGCTCGTATAACCCTTTAAGAGTTGCTCTTTTTGTTTGGTAAGTTGCAAATCGAAGTTTTA"

    Looking further I found a variant of this read, like this one

    "AGTATGGCCCGGGGGATCCTACGTTCCAAATGCAGCGAGCTCGTATAACCCTTTAAGAGTTGCTCTTTTTGTTTGGTAAGTTGCAAATCGAAGTTTTAGAT"

    As you can see, the variant is displaced 3bp in the 5' and 3' ends.

    When searching the web again, I found a document from Illumina, the Illumina customer sequence letter. There I found some sequences that matched my reads, listed as: "Process Controls for TruSeq® Sample Preparation Kits Included in TruSeq DNA and RNA (v1/v2/LT/HT) and TruSeq Exome Kits"

    So it seems that these reads came in as part of the library control, and they were not filtered by the sequencing facility.

    I tested out a couple of tools for removing filtered reads. I used fastx_collapser but turns out that it produces FASTA files as output, not FASTQ files. Then I tested Fastq-mcf, which filtered the repeated reads, both correct repeated reads, and the control library reads.

    After filtering out the repeated reads, now I had some FASTQ files without kmer warnings. Yoo-hoo!

    Now I have to search for another tool to remove only the control reads, and maintaing the valid duplicates reads. I was thinking on using prinseq to remove these reads.

    Thanks for your help!

    Leave a comment:


  • nucacidhunter
    replied
    Apart from Kmer content every parameter looks fine in FastQC report. The number of over-represented Kmers is low (although it is unusual to see in balanced genomes) and I do not think it should be of any concern. The over-represented Kmer could be from duplicate reads (there is a small bump in %total sequences in duplication plot over >10) and it can be checked by removing duplicates and running FastQC again or it could be result of bias in at least one step of library prep due to AT rich nature of genome. Whether duplicates should be removed or not, I think it depends on downstream application and I will let bioinformatician to comment on it.

    Leave a comment:


  • gab0
    replied
    Hi nucacidhunter:

    Thanks for replying. I'll answer by quoting what you posted.

    Originally posted by nucacidhunter View Post
    What kit was used for library prep
    I sent the samples to another, external facility and I don't know which kit they used, so I'll find out ASAP.

    I asked them to sequence my library in a HiSeq 2000 Illumina machine, in paired end runs (2x100bp). As I found out when receiving my reads by the index and the adapter sequence that was sent to me later, they did multiplexing.

    Originally posted by nucacidhunter View Post
    and could you post FastQC plots for per sequence GC content, sequence duplication levels and Illumina adapters.
    They did told me the adapters used (when asked!), which would be these:

    TruSeq Universal Adapter

    5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT

    TruSeq Adapter, Index 5

    5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG

    Attached to the post are the plots for both read files. I have uploaded the plots for forward and reverse files (2nd plot of each category would be the reverse plot).









    Finally the kmer content




    These files should let you download the full FastQC report (Ver 0.11.2) in case you want to see it




    Thank you very much,

    Gabriel
    Attached Files
    Last edited by gab0; 08-07-2014, 07:20 AM.

    Leave a comment:


  • nucacidhunter
    replied
    What kit was used for library prep and could you post FastQC plots for per sequence GC content, sequence duplication levels and Illumina adapters.

    Leave a comment:


  • weird kmer content in 5' end from genomic DNA PE reads

    Hello

    My name is Gabriel. I have asked this previously in the Illumina subforum but it seems that my post belongs here.

    I'm writing because I'm analyzing Illumina reads (generated in a Hiseq 2000) from a genome of a particular insect species. The sequencing facility gave me the FASTQ files without adapters, but when checking the filtered FastQ files with the latest FastQC version (V 0.11.2) I am seeing a weird kmer pattern in the 5' region, it seems that a particular sequence is over represented, but the overrepresented sequence module does not show anything weird.

    Also, it seems that the Kmer content overrepresented has a strong bias towards GC (i.e GGCCCGG, GCCCGGG and so on). I've also managed to overlap the Kmers to this sequence CTAGTATGGCCCGGGGGATCC but so far I've not been able to find anything related to this particular sequence. I'm concerned wheter it is OK to just trim this sequence, as I don't know how which meaning has this particular pattern. This sequence is present in both paired end files, and FastQC shows the kmer content peak in the 5' end of both files.

    When searching this pattern with grep in my files I have noticed that there are several reads that seem to be duplicated, as the read sequence remains the same. I don't know if these duplicated reads should be removed or left.

    So far and during my web search, I've only seen similar Kmer patterns when analyzing RNA-seq data, but this is not the case. Also, the "bad sequence" example from FastQC webpage shows a similar pattern, but in the 3' end, not in the 5' region, as this is my scenario.

    It is worth noting that I have Paired end (2x100) files, and both files (1 and 2) have the same pattern.

    I have attached the Kmer module graphs in these links:




    I can add more information if needed.

    Thank you very much, (and sorry for my english :P)

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
46 views
0 likes
Last Post seqadmin  
Working...
X