Share data in a public repository
Publish your infectious disease and pandemic preparedness data to make it available for the rest of the research community. The data should be deposited in a public repository together with descriptive metadata. For many biological datatypes, there are international databases that can be considered de facto standards.
The European Bioinformatics Institute (EBI) hosts many different international data repositories which should be used if appropriate. For data types where no suitable international repository is available, your data can be deposited to the SciLifeLab Data Repository which is run by the SciLifeLab Data Centre (submissions accepted from all life science researchers in Sweden).
Share data with controlled access
For human data which needs to be stored in a safe environment with controlled access, SciLifeLab can help with publishing and access control (available to all life science researchers at a Swedish academic institution).
Overview of recommended repositories per data type
Below are our data submission guidelines for each specific data type. You can also find similar recommendations per data type on The European COVID-19 Data Portal data submission information.
Genomics & transcriptomics data
We suggest that raw virus sequence data as well as assembled and annotated genomes are submitted to ENA. In order to provide further support to users, the Swedish Pathogens Portal team has also produced a detailed tutorial on submission to ENA. ENA also provide their own documentation to help with submission at SARS-CoV-2 submission.
Before submission of raw sequence data (e.g. shotgun sequencing) it is necessary to remove contaminating human reads. Host (human) sequence data requires restricted access, and NBIS is building a local federated version of the European Genome-phenome Archive (EGA) in Sweden (EGA-SE), allowing for the publication of sensitive personal data within a legal framework. Until local EGA is available, the dataset should remain in the secure analysis environment (e.g., at Bianca on UPPMAX). SciLifeLab can help with publishing and access control. In any case, we recommend to make a metadata-only record in the SciLifeLab Data Repository with contact details on how to get access, and for which a DOI (i.e., a persistent identifier) can be issued. The DOI can then be used in the article to refer to the dataset. Once the Swedish EGA is operational, and the dataset is deposited there, the access information can be changed to point to the EGA ID. See DOI: 10.17044/scilifelab.12292778 for an example.
- The European Nucleotide Archive (ENA)
- ENA SARS-CoV-2 submission tutorial
- SciLifeLab Data Repository for metadata records of sequence data with restricted access
We recommend to use the PRIDE repository provided by the ProteomeXchange Consortium. The repository admits protein and peptide identification/quantification data with the accompanying mass spectra evidence and any other related data types. Submission is done using the PX Submission Tool.
Other types of proteomics data should also be made available, we recommend SciLifeLab Data Repository. In order to make the data useful and ready for analyses and integration, a detailed description of the data format and how the variables are organized should be provided. Each protein variable should come with a unique identifier such as UniProt ID or ENGS ID (and stating the versions used to link the data).
- PRIDE repository and PX Submission Tool
- SciLifeLab Data Repository for other types of proteomics data
Depending on the type of image data you have, different public repositories are available, please see the table at BioImage Archive.
We suggest that users submit data to ChEMBL which is a manually curated database of bioactive molecules with drug-like properties run by EMBL-EBI. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.
In cases where data cannot be deposited into a public repository due to privacy restrictions we suggest creating a metadata-only record on the SciLifeLab Data Repository ((submissions accepted from all life science researchers in Sweden)) with information about what data is available upon request and how such a request can be made. The repository is managed locally by the SciLifeLab Data Centre, and it allows to obtain a DOI which can then be referred to in the publication.
General data repositories
Most life science data types can be published as raw or processed data in repositories at the EMBL-EBI. When no archive is suitable, use a general purpose repository such as the SciLifeLab Data Repository (submissions accepted from all life science researchers in Sweden), Figshare or Zenodo. Besides scientific data, here you can publish documents, presentations, figures, protocols, or other information that you want to make public at any stage in the research process. A publication here is permanent, and provides a Digital Object Identifier, DOI.
Data sharing support
All researchers affiliated with a university or research institute in Sweden working on research topics relevant to pandemic preparedness can receive free individual consultations and hands-on help within reasonable bounds from the Swedish Pathogens Portal team. Simply send an email to email@example.com. Your question will be assigned to a data steward with relevant expertise who can either help you directly or point you to the correct tool or service.
You are welcome to send both general questions about best approaches to research data management, data management plans (DMPs), reproducibility, FAIR, and open science as well as specific questions about your research projects such as which repository to choose to deposit data, what the suitable metadata standards would be, which file formats to use, etc. In some cases the data stewards can act as brokers and submit data to repositories on your behalf.