Seqanswers Leaderboard Ad

**kmcarr** · 08-05-2016, 07:59 AM

Originally posted by Jluis View Post

Dear all,

I have to analyze a set of 26 samples of 16S amplicon data, coming from 250 nt Paired-end Illumina Hi-Seq reads. When I received those sequences they were already demultiplexed , merged and converted into FASTA format. I have no access to Barcode and Primer sequence since the commercial provider who performed the sequencing refuses to provide such information (they say it is confidential information).

After extensively reading qiime documentation and multiple forum questions about how to analyze this kind of sequences, I'm afraid I'm one step beyond in the difficulty of this issue (or one step behind by not understanding the information I read...we will see).

I face 2 main problems:

1) The FASTA header of the sequences.

The current header has this format:

>Sample_Name tagX (Where X is the number of each consecutive tag from 1 to N)

After reading the add_qiime_labels documentation (http://qiime.org/scripts/add_qiime_labels.html) I understand that my header is completely different from that in the examples:

>Sample.1_0 FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_ AACAGATTAGACCAGATTAAGCCGAGATTTACCCGA

And I have no means of obtaining all the information lacking in my headers.

2)How to create a functional mapping file for qiime taking into account my current FASTA headers.

I guess this second issue can be fixed easily if the first Issue can be fixed.

Thanks in advance.

JL

JL,

It appears that your service provider has already done all this work for you.

- You do not need to have the barcode sequences because they have already demultiplexed the reads.

- You probably do not need the primer sequences because it is likely they already trimmed the primers as part of the merging process. If they did not state explicitly whether or not primer sequences were trimmed ask them. This is essential for you to know.

- The header format they provided you is nearly what you need; just change

Code:

>Sample_Name tagX
to
>Sample_Name_X

[Honestly QIIME may be perfectly happy with the format of the FASTA deflines already in the file. I don't use QIIME so can't say for sure.]

- All the other stuff on the example defline in the QIIME manual is worthless. The example is from a Roche 454 GS-FLX read which is a dead platform.

**Jluis** · 08-10-2016, 02:45 AM

Dear kmcarr,

Thank you very much for your answer!
I'm currently on holidays, but I will try to test your solution as soon as I get back to work.

Best

JL

**thermophile** · 08-10-2016, 08:21 AM

Here is how I'm handling demultiplexed data from a MiSeq (I think it should be very similar to HiSeq as far as headers go). Be aware that qiime uses _ as a field deliminator, so you can't have any in your sample name.

bioinformatics/qiime.process.txt at master · krmaas/bioinformatics

https://github.com/krmaas/bioinformatics/blob/master/qiime.process.txt

random collection of scripts used to process sequences - krmaas/bioinformatics

I'm not a fan of qiime, so my script just gets you to the beginning of the process clustering process. If you are just starting out with this kind of analysis, I think mothur is much better documented which makes it easier to learn. Plus mothur does fully de novo clustering, as opposed to qiime's closed reference then de novo the ones that don't match approach. Clustering your data by 2 methods based on an incomplete reference is sketchy.

Topics	Statistics	Last Post
The Adaptation of the Cell Cycle in Multiciliated Cells by seqadmin Started by seqadmin, 06-07-2024, 06:58 AM	0 responses 13 views 0 likes	Last Post by seqadmin 06-07-2024, 06:58 AM
New Method for DNA Sequence Amplification by seqadmin Started by seqadmin, 06-06-2024, 08:18 AM	0 responses 21 views 0 likes	Last Post by seqadmin 06-06-2024, 08:18 AM
New Tools Enhance Single-Molecule DNA Analysis with Minimal Samples by seqadmin Started by seqadmin, 06-06-2024, 08:04 AM	0 responses 20 views 0 likes	Last Post by seqadmin 06-06-2024, 08:04 AM
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 14 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM

Seqanswers Leaderboard Ad

Announcement

Issue with FASTA header in QIIME

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News