Tutorial for SARS-CoV-2 genome data submission to ENA

Frequently Asked Questions (FAQs)

Why should I submit sequences?

The availability of sequence data has been vital for tracking the spread of variants, the identification of new variants, and understanding the relative risk to public health posed by different variants. If sequences had not been made available, it would have hindered efforts to fight the pandemic. Specifically, it would have been more difficult to develop effective policies to prevent the spread of the virus (e.g. face mask mandates, travel restrictions), to appropriately allocate public resources (e.g. healthcare resources), and to develop and deploy vaccines, treatments, and tests.

By making sequences openly available, and adhering to the FAIR principles (see below) when submitting data, you enable others to reuse your data and thus make a significant contribution to COVID-19 research efforts and thus in fighting the pandemic.

What are the FAIR principles?

The FAIR principles were established in 2016. They were established to increase the Findability, Accessibility, Interoperability, and Reusability of data.

By submitting data that is FAIR, submitters facilitate the reuse of their data. This is not the same as making data ‘open’, which refers instead to making data openly accessible.

For more information on the FAIR principles, please see the go-fair website.

Where should I submit sequences?

There are two main international databases in which COVID-19 sequences have been made openly available en masse; the Global Initiative on Sharing Avian Influenza Data (GISAID), and the European Nucleotide Archive (ENA).

So, which should you use? We actually recommend that you submit sequences to both databases where possible, as they each offer relative advantages for research compared to the other. GISAID contains more SARS-CoV-2 data from all around the world, compared to ENA. However, while GISAID only accepts the consensus sequences of assembled genomes, ENA accepts both consensus sequences and ‘raw’ sequence data. Further, although the data in GISAID is considered open, access is restricted to individuals with verified accounts, whilst there are no restrictions on who can access the data in ENA. This means that using data from ENA simplifies sharing the data (e.g. between members of your group) and access to the data is less likely to become compromised during a project.

Who owns/runs ENA?

ENA is maintained by EMBL-EBI, and is a core data resource of ELIXIR (the European life-sciences Infrastructure for biological Information). ENA is part of the INSDC (International Nucleotide Sequence Database Collaboration), and also indexes data from NCBI (National Centre for the Biotechnology Information) and DDBJ (DNA Data Bank of Japan).

Is submitting to ENA secure?

Whilst it is considered openly available, access to data submitted to GISAID is restricted to those with verified accounts. Access to data submitted to ENA is not subject to similar restrictions. Some submitters are therefore concerened that submissions to ENA are somehow less secure. This is not the case though. To access data in GISAID, users must agree to their terms of use. This could essentially be considered a licence for use, similar to that used for other types of data (e.g. an MIT licence). ENA can therefore be considered to have a ‘more open’ licence, which involves fewer restrictions. In theory, the same users can access data in both databases, the difference is that GISAID data cannot be shared as freely as ENA data.

Can I get help submitting my data to ENA?

Absolutely, please refer to the Get Help tab to find where you can get support for your issue.

Can I make the sequence data submitted to ENA visible on the Swedish Pathogens Portal?

Yes, the Swedish Pathogens Portal is happy to display information about sequences deposited by researchers affiliated to a Swedish research institution. If you would be interested in this, please get in touch with the team by e-mailing pathogens@scilifelab.se after you have submitted your sequences.