Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
You can make the SMRTportal (or command line) settings used to generate the filtered fastq files available in the methods/supplemental materials.
-
Reply from the ENA DataSubs team: Please submit the fastq or the native package but not both.
It looks like our first SMRT cell raw data has uploaded OK
Leave a comment:
-
Perhaps this is another question for ENA datasub team.
Having two separate records (one for fastq and other for *.h5 files) may be confusing. Having both in one record makes more sense but sounds like there is no direct way of doing that?
Edit: Unless ENA SRA is going to convert the *.h5 files and make fastq's from them. Again they would have to confirm that.Last edited by GenoMax; 03-10-2016, 09:55 AM.
Leave a comment:
-
In this case from two SMRT cells I have one FASTQ file of filtered subreads used in the analysis, but I can easily split it up into one FASTQ file per run based on the read names.
Leave a comment:
-
Originally posted by maubp View PostI'm quite willing to, but unsure how they'd want that - I could upload the processed FASTQ as another run?
Leave a comment:
-
I'm quite willing to, but unsure how they'd want that - I could upload the processed FASTQ as another run?
Leave a comment:
-
That makes sense.
Are you also submitting fastq/fasta files that went into your analysis (since they would be generated after some filtering etc using SMRTportal or command line tools)?
Do you know how ENA makes original files available for PacBio? On the page where they have fastq files?
Leave a comment:
-
The DataSubs team replied for each PacBio SMRT cell run they want three *.bax.h5 files, one *.bas.h5 file, and one *.metadata.xml file.
i.e. Something like this for the PacBio example above (using made up checksum values):
Code:$ cat run_1_manifest.all 7b382592c46607ec0348bf969ed8b01f m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.bax.h5 2b912a574ad5e264f781ca495b0b5908 m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.bax.h5 6c7c66e4e2aa1e5516f7d7c16b0ef8b2 m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.bax.h5 3f6067c02aa643eb5d609197defc3baa m140415_143853_42175_c100635972550000001823121909121417_s1_p0.bas.h5 c12eafa8bf1cc3c1548c1625d9edad7c m140415_143853_42175_c100635972550000001823121909121417_s1_p0.metadata.xml
Update:
Jeena at the EBI Data Submissions team kindly allowed me to post her advice - note the screenshot shows the expected MD5 based manifest file on which I based the example above:
Dear Peter,
A Pac Bio run normally consists of 5 files. They are 3 bax.h5, 1 bas.h5, and the equally important metadata.xml file. If you use Webin you must create a manifest file as explained here:
If you want to reference each file separately per run the please use the REST submission service:
Here is a template for a pac bio run.
Please let us know if you require more help. My colleague Marc is currently away but will be back in the office tomorrow and will be able to provide further help if needed.
Kind regards,
Jeena
Leave a comment:
-
I've emailed the EBI DataSubs team, and will post back once I know the answer.
Leave a comment:
-
Thanks - ENA are not clear but suspect you're right and they want the *.metadata.xml - and perhaps the *.sts.xml files too (summary statistics).
Leave a comment:
-
I meant to specifically say metadata.xml (details of the files are described here: https://github.com/PacificBioscience...rvice-provider)
Leave a comment:
-
When you say *.xml do you mean all of them (at both levels of the directory hierarchy)?
Leave a comment:
-
You should submit the metadata.xml file because as I remember it is difficult (or impossible) to recreate and that file is needed to import/analyze data in SMRTportal.
The *.h5 files you submit become available as is under the "Download" tab so people can get at the raw data. At least that is how things work in SRA.
Leave a comment:
-
Uploading PacBio raw data to ENA SRA
Surprisingly I was not able to find anything on Google about the details of uploading raw PacBio sequence reads to the European Nucleotide Archive (ENA), the EBI-EMBL twin of the Short Read Archive (SRA).
http://www.ebi.ac.uk/ena/submit/read...bio_hd5_format just says:
PacBio format
PacBio data submissions are supported in the platform specific native format.
One run consists of *.bax.h5, *.bas.h5 and xml files. Please note that these files must not be tarred.
Code:/path/to/secondary/storage/2420294/0011 ├── Analysis_Results │ ├── [B]m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.bax.h5 --> ENA[/B] │ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.log │ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.subreads.fasta │ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.subreads.fastq │ ├── [B]m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.bax.h5 --> ENA[/B] │ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.log │ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.subreads.fasta │ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.subreads.fastq │ ├── [B]m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.bax.h5 --> ENA[/B] │ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.log │ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.subreads.fasta │ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.subreads.fastq │ ├── [B]m140415_143853_42175_c100635972550000001823121909121417_s1_p0.bas.h5 --> ENA[/B] │ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.sts.csv │ └── [B]m140415_143853_42175_c100635972550000001823121909121417_s1_p0.sts.xml --> ENA[/B] ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.xfer.xml ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.xfer.xml ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.xfer.xml ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.mcd.h5 └── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.metadata.xml
PacBio HDF5
One PacBio HDF5 file is submitted for each run.
Please choose one of the following manifest files present in your drop box. A manifest file ( *.all ) contains all files ( *bas.h5, *.bax.h5 and *.xml ) and their MD5 checksums associated with a single PacBio run. The format of the manifest file must correspond to the output of the md5sum command.
If your file is not listed below, it was either not found in your drop box or its extension was not recognized.
Code:$ cd /path/to/secondary/storage/2420294/0011/Analysis_Results $ md5sum *bas.h5 *.bax.h5 *.xml > manifest.all
Latest Articles
Collapse
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 05-14-2024, 07:03 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
05-14-2024, 07:03 AM
|
||
Started by seqadmin, 05-10-2024, 06:35 AM
|
0 responses
44 views
0 likes
|
Last Post
by seqadmin
05-10-2024, 06:35 AM
|
||
Started by seqadmin, 05-09-2024, 02:46 PM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
05-09-2024, 02:46 PM
|
||
Started by seqadmin, 05-07-2024, 06:57 AM
|
0 responses
42 views
0 likes
|
Last Post
by seqadmin
05-07-2024, 06:57 AM
|
Leave a comment: