Skip to main content

Tutorial for SARS-CoV-2 genome data submission to ENA

Frequently Asked Questions (FAQs)

Why should I submit sequences?

The availability of sequence data has been vital for tracking the spread of variants, the identification of new variants, and understanding the relative risk to public health posed by different variants. If sequences had not been made available, it would have hindered efforts to fight the pandemic. Specifically, it would have been more difficult to develop effective policies to prevent the spread of the virus (e.g. face mask mandates, travel restrictions), to appropriately allocate public resources (e.g. healthcare resources), and to develop and deploy vaccines, treatments, and tests.

By making sequences openly available, and adhering to the FAIR principles (see below) when submitting data, you enable others to reuse your data and thus make a significant contribution to COVID-19 research efforts and thus in fighting the pandemic.

What are the FAIR principles?

The FAIR principles were established in 2016. They were established to increase the Findability, Accessibility, Interoperability, and Reusability of data.

By submitting data that is FAIR, submitters facilitate the reuse of their data. This is not the same as making data ‘open’, which refers instead to making openly accessible.

For more information on the FAIR principles, please see the go-fair website.

Where should I submit sequences?

There are two main international databases in which COVID-19 sequences have been made openly available en masse; the Global Initiative on Sharing Avian Influenza Data (GISAID), and the European Nucleotide Archive (ENA).

So, which should you use? We actually recommend that you submit sequences to both databases where possible (see information in the Introduction tab for details), as they each offer relative advantages for research compared to the other. In some cases though, this may not be possible. For example, GISAID only accepts assemblies reflecting a consensus sequence, whereas ENA accepts both ‘raw sequences’ and assemblies. Thus, in the case of ‘raw’ sequence data, please submit to ENA.

Work is ongoing to streamline the process for submitting sequences to both databases. Ultimately, we hope to make it as easy to submit to both databases as it is to submit to just one.

Who owns/runs ENA?

ENA is maintained by EMBL-EBI, and is a core data resource of ELIXIR (the European life-sciences Infrastructure for biological Information). See here for more information about what this means.

Is submitting to ENA secure?

Whilst it is considered openly available, access to data submitted to GISAID is restricted to those with verified accounts. Access to data submitted to ENA is not subject to similar restrictions. Some submitters are therefore concerened that submissions to ENA are somehow less secure. This is not the case though. To access data in GISAID, users must agree to their terms of use. This could essentially be considered a licence for use, similar to that used for other types of data (e.g. an MIT licence). ENA can therefore be considered to have a ‘more open’ licence, which involves fewer restrictions. In theory, the same users can access data in both databases, the difference is that GISAID data cannot be shared as freely as ENA data. In addition, data in GISAID could also be submitted to ENA.

Can I get help submitting my data to ENA?

Absolutely, please refer to the Get Help tab to find where you can get support for your issue.

Can I make the sequence data submitted to ENA visible on the Swedish COVID-19 Data Portal?

Yes, the Swedish COVID-19 Data Portal is happy to display information about sequences deposited by researchers affiliated to a Swedish research institution. If you would be interested in this, please get in touch with the team by e-mailing datacentre@scilifelab.se after you submit your sequences.