EMPIAR Policies and Processing Procedures

Authored by the EMPIAR team (empiar-help@ebi.ac.uk)

Version:

Date: 2023/10/23

Valid from: 2023/10/23

Valid until: superseded by revised document

Preface

This document details the policies and procedures governing the Electron Microscopy Public Image Archive (EMPIAR), a public repository for raw two-dimensional image data from 3D bioimaging experiments as well as certain 3D bioimaging datasets. The repository organises its data into EMPIAR entries, and this document covers the policies regarding what constitutes an EMPIAR entry, the requirements for entry depositions into the repository, the underlying entry data processing and ultimately data provision to the public. Any policy issues that may arise should refer to the date and version provided above, and be raised by contacting the EMPIAR team at empiar-help@ebi.ac.uk. These queries could pertain to issues not currently covered by this document, inconsistencies inside this document, requests for clarification, suggestions for improvement, and discussions of potential exceptions.

Table of contents

Preface

Table of contents

I. Introduction

II. Definitions

A. Abbreviations

B. Terms

1. Entry requirements

1.1 Entry acceptance.

1.2 Currently supported entry types

1.3 Entry association.

1.4 Entry auxiliary data.

2. Entry ownership and authorship

2.1 User roles and permissions.

2.2 Deposition owner and authors.

3. Deposition requirements

3.1 Deposition acceptance.

3.2 Data file requirements.

4. Deposition and accession code assignment

4.1 EMPIAR deposition code.

4.2 EMPIAR accession code.

5. Processing procedures

5.1 Deposition process.

5.2 Processing procedure.

5.3 Release process.

5.3.1 Release instruction.

5.3.2 Status codes.

5.3.3 Procedure for release.

6. Modification of entries

6.1 Entry changes before release.

6.2 Entry changes after release.

6.2.1 Entry changes requested by EMPIAR after release.

6.3 Abandoned unlocked entries.

7. Removal of entries

7.1 Entry obsoletion.

7.2 Entry withdrawal.

7.3 Entry withdrawal after the hold period expires.

7.4 Entry removal in unusual circumstances*.

I. Introduction

The Electron Microscopy Public Image Archive (EMPIAR) is a public resource for raw, 2D images from molecular and cellular 3D bioimaging experiments, as well as some 3D datasets, obtained using transmission or scanning electron microscopy and electron or X-ray tomography. All data in EMPIAR is freely and publicly available to the global community under the CC0 licence (https://creativecommons.org/share-your-work/public-domain/cc0/ ).

As part of EMBL-EBI, EMPIAR is committed to place all primary and derived data in the public domain. The stated mission of EMBL-EBI includes the provision of freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress.

This document outlines current policies governing EMPIAR deposition, processing and release. The EMPIAR staff will continue to update annotation practices in line with evolving structure determination techniques and annotation methods.

II. Definitions

A. Abbreviations

Abbreviation

Meaning

3DSEM

Three-Dimensional Scanning Electron Microscopy (e.g., FIB - Focussed Ion Beam and SBF - Serial Block Face)

CLEM

Correlative Light and Electron Microscopy

CLXM

Correlative Light and X-ray Microscopy

EM

Electron Microscopy, usually understood to be transmission cryo-electron microscopy or tomography

EMDB

Electron Microscopy Data Bank

EMPIAR

Electron Microscopy Public Image Archive

LM

PDB

Light microscopy

Protein Data Bank

PDB-Dev

A prototype deposition and archiving system for structural models and associated experimental and metadata obtained through integrative/hybrid (I/H) methods

PI

Principal Investigator

SXT

Soft X-ray Tomography

B. Terms

Term

Description

Owner

The (scientific) owner of a deposition must be a PI who is scientifically responsible for the study that generated the data. There may be more than one PI designated as owners. Owners do not need to have an account on the EMPIAR deposition system (although they may)

Depositor

The person who owns an EMPIAR account that is associated with a specific deposition and who has permission to modify the deposition. The depositor may also be the or an owner but they don’t have to be. There is only one depositor for any given deposition

Author

Any person, designated by the owner(s), who has in any way contributed to the data in an EMPIAR entry. Authors do not need to have an account on the EMPIAR deposition system (although they may)

Public (EMPIAR) archive

The authoritative public copy of the EMPIAR archive as maintained by and at EMBL-EBI and from which any and all public EMPIAR files can be downloaded. There may be official mirror sites of the public archive

Mirror (EMPIAR) site

Official replicas of the main Public (EMPIAR) archive established on agreement basis. May or may not distribute all the data. Example: https://empiar.pdbj.org/

Operators of the (EMPIAR) archive

EMBL-EBI, the organisation that acts as archive keeper of the public archive. They can be contacted via email to empiar-help@ebi.ac.uk

1. Entry requirements

1.1 Entry acceptance.

An EMPIAR entry is a set of microscopy files that are tied together logically. The main unifying characteristic of these files can be a publication, an EMDB map that has been obtained from them or a specific experiment during which they have been obtained. A single EMPIAR entry may contain multiple image sets, so it can refer to, for example, multiple EMDB entries, or multiple publications. Alternatively, an EMPIAR entry may be tied to a single publication or EMDB entry.

An entry is comprised of one or more image sets where each image set is one complete set of images used, for example, to obtain one of the associated EMDB entries or to make a single 3D reconstruction from an SBF-SEM experiment. An example of an image set could be a collection of multi-frame micrographs, a stack of particle images, or a tomogram obtained with SXT. Each image set should reside in a separate directory.

1.2 Currently supported entry types

EMDB - raw image data relating to structures deposited to the Electron Microscopy Data Bank

SBF-SEM - image data collected using serial block-face scanning electron microscopy (like the Gatan 3View system)

SXT - image data collected using soft x-ray tomography

FIB SEM - image data collected using focused ion beam scanning electron microscopy

IHM - integrative hybrid modelling data

CLEM - correlative light-electron microscopy. In this case any light-microscopy data should be deposited to BioImage Archive first and then the EM data should be deposited to EMPIAR and the BioImage Archive accession ID provided as a cross-reference

CLXM - correlative light X-ray microscopy. In this case any light-microscopy data should be deposited to BioImage Archive first and then the XM data should be deposited to EMPIAR and the BioImage Archive accession ID provided as a cross-reference

MicroED - microcrystal electron diffraction

ATUM-SEM - Automated Tape-collecting Ultramicrotome Scanning Electron Microscopy

Hard X-ray/X-ray microCT - Hard X-ray/X-ray micro-computed tomography

ssET - serial section electron tomography

1.3 Entry association.

All EMPIAR entries (with certain exceptions, see below) are required to be associated with one or more EMDB entries. “Associated” in this context means that it should be the image data used to obtain the 3D reconstruction(s) deposited as one or more EMDB entries. In such cases depositors are encouraged to inform EMDB and PDB (as appropriate) about the EMPIAR accession.

EMPIAR will accept data that is not associated with an EMDB entry in the following cases:

      Biological 2D/3D data from 3D electron and X-ray imaging modalities not covered by EMDB (e.g. 3DSEM or SXT);

      2D EM data used in integrative/hybrid methods, associated with a structure deposited in the PDB or PDB-Dev archive;

      Certain reference and benchmark datasets (to be decided on a case-by-case basis)*;

      Datasets used for certain community challenges (such as the 2015 Map Validation Challenge, see: “The first single particle analysis Map Challenge: A summary of the assessments,” J. Struct. Biol. 204 (2018), 291-300, https://doi.org/10.1016/j.jsb.2018.08.010)*.

* We are keen to support community challenges and archival of reference data sets. Please contact the operators of the EMPIAR archive prior to deposition.

Data that is out of scope for EMPIAR includes (but is not limited to):

      Data on non-biological samples;

      Data outside the scale range from molecules to organisms;

      Clinical and any patient-identifiable data;

      Light-microscopy data and any non-microscopy data (this should be deposited to the BioImage Archive, https://www.ebi.ac.uk/bioimage-archive/; this also applies to the LM data from correlative experiments such as CLEM and CLXM);

      Data on “small molecules” (depending on their nature, this might be accepted by the PDB, NDB, CCDC or other archives).

In cases not covered above, please contact the operators of the EMPIAR archive prior to deposition to discuss the potential suitability of EMPIAR for your data.

1.4 Entry auxiliary data.

Auxiliary data such as files containing particle coordinates, segmentations, gain reference, video guides through the dataset in orthogonal planes, Blender projects, motion correction and gain normalisation scripts may also be included as a part of the entry. If in doubt, please contact the operators of the EMPIAR archive.

2. Entry ownership and authorship

2.1 User roles and permissions.

The EMPIAR deposition system is user-based. Individual users sign on to the system, and can create and handle multiple depositions. Users may also invite other users to share access to a deposition with varying degrees of access privileges.

2.2 Deposition owner and authors.

The PI(s) scientifically responsible for the study is designated as the owner of the deposition. The owner can delegate the upload and entry of data and metadata to another person. The EMPIAR entry and associated EMDB entry (or entries) must have the same PI/owner. It is the owner’s responsibility to make sure that consent has been given by all the publication authors, EMDB entry authors and EMPIAR entry authors to deposit the data to EMPIAR. The owner of the deposition is ultimately responsible for making sure that the information provided during the deposition process is correct, and that consent has been granted by all citation authors, EMDB entry authors and EMPIAR entry authors to act on their behalf.

All communication regarding a deposition from the EMPIAR annotation team will be addressed to the owner(s) and the depositor. It is the owner’s responsibility to make sure that information is then further channelled to the appropriate authors. The names and ORCID iDs* of all authors associated with the EMPIAR entry are collected and made public upon release. Additional information about the corresponding author and PI/owner (including institutional address, email and phone numbers) is collected during the deposition process but not made public. It is however stored to enable future communications regarding the entry. The corresponding author of the paper must be one of the authors associated with the deposition, but does not necessarily have to be the owner of the deposition. The owner(s) (provided they have an account) and depositor can invite an anonymous reviewer to inspect an EMPIAR deposition.

* https://orcid.org/; ORCID iDs are persistent, unique digital identifiers for researchers

3. Deposition requirements

3.1 Deposition acceptance.

We highly recommend that the full raw data is deposited, with each image set in a separate directory. For 3D bioimaging depositions not related to an EMDB entry, we also recommend that the 3D reconstruction is deposited. Every EMPIAR entry must have a thumbnail image representative of the entry; the depositor is required to provide such an image (not subject to any copyright restrictions) in PNG, JPEG, TIFF or GIF format, with a minimum size of 400 x 400 pixels.

3.2 Data file requirements.

Data files should preferably be deposited in one of the following formats. Any exception should be discussed with and approved by EMPIAR beforehand.

File type

File extension(s)

Example software to open these files

Can contain

Image data

MRC individual images

.mrc

Fiji

Tilt series, micrographs, picked particles

MRC stacks

.mrcs

Fiji

Micrographs, picked particles

DM3/DM4 stacks

.dm3/.dm4

Fiji

Micrographs, class averages

TIFF/TIF individual images

.tiff/.tif

Fiji

Micrographs

SPIDER individual images

.spi

EMAN2

Micrographs

SPIDER stacks

.spi

EMAN2

Picked particles

Big data viewer HDF5

.h5

Fiji

DAWN

Micrographs, 3D volumes

IMAGIC stacks

.hed and .img – both have to be provided

EMAN2

Picked particles

EER                .eer                            RELION                     Micrographs

 

XML                .xml                            Fiji                             Text files

 

SMV               .smv                            ADXV                        Diffraction patterns

 

REC               .rec                              IMOD                        tomograms

 

NXS               .nxs                             Diffractem                  Diffraction patterns

 

Auxiliary data

EMDB-SFF

.xml

Fiji

Segmentations and annotations

Amira

.am

Amira

Segmentations

VTK

.vtk

VTK

Segmentations

VTP

.vtp

VTK

Segmentations

STL

.stl

MeshLab

Segmentations

Big data viewer HDF5

.h5

Fiji

Segmentations

EMX

.emx

EMX in Bsoft or EMAN2

Electron microscopy exchange format metadata (reference)

SCIPION workflow

.json

Scipion for processing or any plain text reader for viewing (e.g., Vim)

Integration, reproducibility and analysis workflow files for Scipion - an image processing framework to obtain 3D models of macromolecular complexes using Electron Microscopy

JPEG

.jpg/.jpeg

Fiji

Figures, etc.

PNG

.png

Fiji

Representative thumbnail, paper figures, etc.

TIFF

.tiff

Fiji

Representative thumbnail, paper figures, etc.

TXT

.txt

Vim or Emacs

Additional details, data-collection information, defocus parameters, etc.

SHELL

.sh

Vim or Emacs

Shell scripts, e.g., for using MotionCor2 to gain normalize and similar

AVI

.avi

VLC media player

Video guides through a dataset

MPEG

.mpg/.mpeg/.mp4

Quick Time Player or MPC-HC

Video guides through a dataset

BLENDER

.blend

Blender

Blender project of segmented objects

ImageJ

.ijm

ImageJ or Fiji

ImageJ macro language scripts

OBJ

.obj

Fiji

Segmented 3D objects in a plain text format with basic geometry and material support, e.g. as produced by Blender

The depositor may optionally choose to upload data in compressed archives. The benefit of this is faster upload for the depositor and faster download for the end-user. However, there is a computational cost of compressing and decompressing the files. All files will have to be compressed in one of the following formats: XZ, BZIP2, 7-ZIP.

4. Deposition and accession code assignment

4.1 EMPIAR deposition code.

An EMPIAR deposition code is assigned as soon as a user creates a new deposition. This code is used internally by the EMPIAR deposition and annotation system and should not be used publicly.

4.2 EMPIAR accession code.

A public EMPIAR accession code is assigned immediately after the user submits the deposition. The accession code is sent in an automatic email to the depositor and the owner(s). Also, anyone with permission to view the deposition will be able to see the accession code on the main deposition page. The EMPIAR accession code may be used, e.g., in publications, to refer to the entry.

5. Processing procedures

5.1 Deposition process.

Data deposition to EMPIAR consists of the following steps:

      General metadata and information about the deposition is provided by the depositor (EMDB accession ID, title, authors, etc.).

      General metadata is validated by the system.

      Data consisting of image sets in separate directories is uploaded by the depositor.

      Description of each image set (image set metadata) is provided and linked to the corresponding image set directory by the depositor.

      Image set metadata is validated by the system.

      The depositor submits the entry for further processing by EMPIAR staff. Note: the depositor will not be able to submit the entry unless all validation steps have been successful.

      The deposition must be submitted within five calendar months after its creation. An email will be sent to the depositor two months before this deadline followed by monthly reminders. After five months all the uploaded data will be removed and the deposition’s status changed to ABANDONED. For the avoidance of doubt: data belonging to an ABANDONED deposition is no longer kept on EMPIAR file systems and cannot be retrieved by EMPIAR staff.

      Immediately upon submission, the depositor and the owner of the entry receive notification of successful submission and are given the EMPIAR accession code that can be used in a publication to link to the EMPIAR entry.

5.2 Processing procedure.

Deposited data are processed (also referred to as annotation or curation) by EMPIAR staff, which involves the following steps:

      Checking the consistency of the deposited data and metadata with the information specified in the deposition forms (EMDB accession ID, title, authors, etc.), and with any related EMDB entries.

      Ensuring that image set metadata is provided and is correct, and that it is correctly associated to the corresponding image set directory.

      If any major changes have to be made, then the entry is unlocked for further editing by depositors. The entire entry is checked again after re-submission.

The deposition is locked (i.e., not editable by the depositor) from the moment it is submitted. Only if changes that need to be made are identified during the processing procedure will the entry be unlocked again. Communication between the depositors and EMPIAR staff can be carried out within the deposition system or by regular email.

5.3 Release process.

The EMPIAR entry can progress to the release process phase only when the following conditions have been met:

      All submitted deposition forms have been correctly completed and successfully validated by EMPIAR staff;

      Representative images inspected by EMPIAR staff appear consistent with descriptions of the data;

      Annotation has finished.

Once these conditions have been fulfilled, the EMPIAR entry is released to the public in accordance with the release instruction provided by the depositor (see 5.3.1 below for available options).

5.3.1 Release instruction.

EMPIAR depositions are released (made available to the public) in accordance with the release instruction provided during deposition. Release instruction options are summarised in the table below. (Note that the physical release of large entries is not instantaneous. Synchronisation with mirror sites may lead to additional delays before an entry is shown on such sites.)

Release instruction

Description

REL

As soon as the annotation procedure is complete and the entry has been approved by the depositor, the release procedure will be initiated

EMDBPUB

Release after the associated EMDB entry has been released. If one year after the deposition date the associated EMDB entry has not been released, the EMPIAR entry will be deleted and never be publicly released. (Later release will require the data to be deposited anew.) The EMPIAR accession code will not be recycled. A one-time extension of no more than 6 months will be considered if (one of) the owner(s) requests this and provides a reasonable explanation

HPUB

Release after the primary citation for the dataset becomes available. The same procedure as for EMDBPUB will be applied if the publication is not available one year after the deposition date

HPRE

Release after the related preprint citation has been published. The same procedure as for EMDBPUB will be applied if the publication is not available one year after the deposition date

HOLD

Release after a specified period, not to exceed one year. This option is only available if there is no related EMDB entry or publication. A one-time extension of no more than 6 months will be considered if (one of) the owner(s) requests this and provides a reasonable explanation

Information about the entry will not be made public until the entry is released, except when requested by a journal considering the related manuscript for publication. In that case, the journal must provide the EMPIAR accession code, manuscript title and the author list of the publication for verification purposes. Only if the two lists of authors have at least one PI in common, will we provide information about the status of the deposition prior to release.

5.3.2 Status codes.

Depositions can have one of a number of status codes that are described in the following table.

Status code

Description

PROC

The entry has been submitted

REL

The entry has been publicly released

WAIT

The entry has been looked at, but we wait for additional information from the depositor

OBS

The entry has been released, then obsoleted. The data is no longer part of the active archive but is still publicly available. See section 7.1

WDRN

The entry has been withdrawn before release and has never been publicly available. See section 7.2

UNARCH

The entry has been removed due to unusual circumstances. Depending on the case, the entry may still be publicly available. See section 7.3

5.3.3 Procedure for release.

In order to release an entry and make it publicly available, it is necessary to synchronise the uploaded data to the public archive at EMBL-EBI (and subsequently to any official mirror sites), and to create and update a number of files and database records. This involves various processes including:

      The depositor is sent an email (copied to the owner(s)) with a request to approve the release (for all release instruction cases). If the depositor approves (or once fourteen days have elapsed since the request and no reply has been received), then the entry release is initiated.

      Header files are created in the deposition upload directory.

      The synchronisation process of the upload directory and the public archive is initiated (i.e., all files are copied over).

      The representative thumbnail image(s) uploaded by the depositor.

      A request is submitted to www.crossref.org for the DOI assignment of the EMPIAR entry (that will resolve to the EMPIAR web page for that entry at EMBL-EBI).

      The depositor and the owner(s) are notified that the entry has been released.

      Once the public archive at EMBL-EBI has been updated, any official EMPIAR mirror sites can begin their synchronisation process.

6. Modification of entries

6.1 Entry changes before release.

Before public release, most issues can be resolved by communicating with the depositor via the deposition system. To delete, rename or move uploaded files we require the depositor to contact the operators of the EMPIAR archive. In some cases after submission (which locks the deposition from further modifications), additional data or information from the depositor may be required, making it necessary to unlock the deposition. This option is used sparingly to avoid creating inconsistencies in the metadata.

6.2 Entry changes after release.

In some cases, it may be necessary to modify an entry after release. This may apply to both the data and the metadata. A common example of a metadata change is addition of information about the publication. Occasionally, additional data files may need to be added or unintentionally uploaded files may need to be removed. Such changes should be discussed with the operators of the EMPIAR archive and should be justified by the depositor. Changes after release should be made only when strictly necessary. Depending on the nature of the changes, it may take some time for them to be reflected in the public archive and on any official mirror sites. A version history is maintained and distributed in the public archive.

6.2.1 Entry changes requested by EMPIAR after release.

In some cases additional information will be required or some files will have to be renamed, moved or removed. The EMPIAR team will contact the depositor and, if there is no reply given after three months, then the entry will be either kept as is, modified or obsoleted at the discretion of the EMPIAR team.

6.3 Abandoned unlocked entries.

An unlocked entry must be re-submitted within one month. An email will be sent to the depositor fifteen days before the due date with additional reminders every three days. Once the deadline expires, the entry will automatically be re-submitted, locked from further changes by the depositor and handed back to an EMPIAR annotator. If there is any critical data missing from the deposition and the depositor is not responding for over a month, the deposition may be removed at the discretion of EMPIAR staff. Otherwise, it can be released in the state it was in at the moment of re-submission, while still adhering to the release instruction.

7. Removal of entries

7.1 Entry obsoletion.

An EMPIAR entry can only be obsoleted after the submission has been released. Obsoleting an entry will change the status code of the deposition to OBS, move its files out of the active part of the public archive into a separate area of the public archive (called the "obsolete archive"). Note that the entry will still be publicly accessible. Entry obsoletion can be requested, and needs to be justified, by the owner(s) or depositor of an entry. A confirmation email will be sent to the depositor and owner(s).

7.2 Entry withdrawal.

If an EMPIAR entry has not yet been released, it can be withdrawn at the depositor and owner(s) request it before release. An entry that is withdrawn will never be released and there will be no public record of the deposition ever having been made. In internal records, the status code will be changed to WDRN.

7.3 Entry withdrawal after the hold period expires.

The maximum hold period for EMPIAR entries is 5 calendar months after the annotation of the entry has ended. A reminder email will be sent to the depositor two months and one month before the hold period is set to expire. After twelve months, all the uploaded data will be removed and the entry’s status changed to WDRN for entries that have never been released.

7.4 Entry removal in unusual circumstances*.

Circumstances may arise in which the integrity, correctness, ownership or provenance of data is called into question. In such unusual circumstances, the operators of the EMPIAR archive may decide to make an entry, in whole or in part, obsolete (moved out of the active archive, but still publicly accessible) or to remove it entirely from the public record. Examples of cases in which this might occur are:

      The publication describing a dataset is retracted by (some of) the author(s), their home institute, or the journal in which it was published, and the retractor(s) request that the corresponding data/entries in EMPIAR are removed from the scientific record as well.

      An official investigative body (for example, the Office of Research Integrity in the USA) recommends that data or entries in EMPIAR should be removed from the scientific record.

In all such cases, the operators of the EMPIAR archive will endeavour to ascertain if (parts of) one or more entries need to be removed and, if so, in which way (obsoletion or withdrawal). Such unusual cases will be documented on the website of the operators of the EMPIAR archive (and any sites that mirror it). The status code of affected entries will be changed to UNARCH.

* The wording of this section is preliminary. Input and views from the community on this issue are welcome.

empair