Tutorial for SARS-CoV-2 genome data submission to ENA now available
Since the beginning of the COVID-19 pandemic, a plethora of scientists are urging to openly share the SARS-CoV-2 genome data in order to be able to track how viral variants are spreading around the world. SARS-CoV-2 sequences have been posted online in large numbers since January 2020.
The Global Initiative on Sharing Avian Influenza Data (GISAID) has been the most popular data-sharing platform and now hosts approximately 8 million SARS-CoV-2 genome sequences. However, GISAID does not allow re-sharing of sequences publicly, only registered users have access to the data stored there. This means that it is not possible to easily check or build on analyses that have made use of sequences in GISAID. Therefore, we advise that the sequences are shared both through GISAID and through another popular sequence sharing platform - European Nucleotide Archive (ENA) which is run by the European Bioinformatics Institute (EMBL-EBI) and currently hosts around 700.000 SARS-CoV-2 sequences.
Yet another challenge is that researchers are allowed to deposit “assemblies”, reconstructions of viral genomes from raw data, but not raw data itself to GISAID’s platform. The issue is that assembly requires interpretation of errors that take place in the sequencing process, and these interpretations may be incorrect. Having access to raw data could help scientists dig into these issues. Therefore, we recommend that researchers share both raw and assembled sequencing data, and both of these can be deposited to the European Nucleotide Archive (ENA).
The ENA is part of International Nucleotide Sequence Database Collaboration (INSDC) along with the US GenBank, and the DNA Data Bank of Japan (DDBJ). INSDC covers the spectrum of data raw reads, through alignments and assemblies to functional annotation, enriched with contextual information relating to samples and experimental configurations, whilst advocating for openness in data sharing and reuse.
Today, the Swedish COVID-19 Data Portal is launching a tutorial on submitting SARS-CoV2 sequences to ENA. The tutorial has been developed by the Portal and the NBIS Data Management teams in order to guide researchers to easily submit data (spanning from raw reads to assemblies) to ENA. The tutorial caters to both experienced and first time sequence submitters.
Please note that the Portal team is also happy to function as a broker for researchers who have a large number of sequences and who would like to submit these to ENA. We offer this service to Swedish life science researchers. To find out more, send an email to firstname.lastname@example.org and tell us how we can help you.