Storage Plugins
Data storage stands as one of the primary requirements of a PACS archive. Medical imaging storage requirements have become increasingly demanding, to the point that relying on a server’s local file system has become inadequate in some use cases. Delegating storage to cloud services, for instance, have been thoroughly discussed in recent years.
Storage plugins (or storage providers) represent a particular “place” where files can be read and stored, and sometimes also define how these files are stored and read. The Dicoogle Storage API serves the following two purposes:
- To make an abstraction over the underlying storage technology, thus being able to use and evaluate different sources of data storage (e.g. cloud storage services such as Amazon S3) and different forms of persistence (e.g. using a document-oriented database instead of a file system). With this common API, DICOM object reading and storing becomes possible, regardless of the underlying technology.
- To augment storage and retrieval procedures with certain algorithms, such as for anonymization, compression, and encryption.
Programmatically, the storage interface is currently defined as below:
/** Storage plugin interface. These types of plugins provide an abstraction to reading and writing from
* files or data blobs.
*
* @author Luís A. Bastião Silva <bastiao@ua.pt>
* @author Frederico Valente
*/
public interface StorageInterface extends DicooglePlugin {
/**
* Gets the scheme URI of this storage plugin.
*
* @see URI
* @return a string denoting the scheme that this plugin associates to
*/
public String getScheme();
/**
* Checks whether the file in the given path can be handled by this storage plugin.
*
* The default implementation checks that the given location URI
* has the exact same scheme as the scheme returned by {@link #getScheme()}.
*
* @param location a URI containing a scheme to be verified
* @return true if this storage plugin is in charge of URIs in the given form
*/
public default boolean handles(URI location) {
return Objects.equals(this.getScheme(), location.getScheme());
}
/**
* Provides a means of iteration over all existing objects at a specified location,
* including those in sub-directories.
* This method is particularly nice for use in for-each loops.
*
* The provided scheme is not relevant at this point, but the developer must avoid calling this method
* with a path of a different scheme.
*
* <pre>
* URI uri = URI.create("file://dataset/");
* for (StorageInputStream dicomObj: storagePlugin.at(uri)) {
* System.err.println(dicomObj.getURI());
* }
* </pre>
*
* @param location the location to read
* @param parameters a variable list of extra retrieval parameters
* @return an iterable of storage input streams
* @see StorageInputStream
*/
public Iterable<StorageInputStream> at(URI location, Object... parameters);
/**
* Obtains an item stored at the exact location specified.
*
* The provided scheme is not relevant at this point,
* but the developer must avoid calling this method
* with a path of a different scheme.
*
* <pre>
* URI uri = URI.create("file://dataset/CT/001.dcm");
* StorageInputStream item = storagePlugin.get(uri);
* if (item != null) {
* System.err.println("Item at " + dicomObj.getURI() + " is available");
* }
* </pre>
*
* The default implementation calls {@linkplain #at}
* and returns the first item if its URI matches the location requested.
* Implementations may wish to override this method for performance reasons.
*
* @param location the URI of the item to retrieve
* @param parameters a variable list of extra retrieval parameters
* @return a storage item if it was found, <code>null</code> otherwise
* @see StorageInputStream
*/
default public StorageInputStream get(URI location, Object... parameters) { ... }
/**
* Stores a DICOM object into the storage.
*
* @param dicomObject Object to be Stored
* @param parameters a variable list of extra parameters for the retrieve
* @return The URI of the previously stored Object.
*/
public URI store(DicomObject dicomObject, Object... parameters);
/**
* Stores a new element into the storage.
*
* @param inputStream an input stream with the contents to be stored
* @param parameters a variable list of extra parameters for the retrieve
* @return the URI of the stored data
* @throws IOException if an I/O error occurs
*/
public URI store(DicomInputStream inputStream, Object... parameters) throws IOException;
/** Removes an element at the given URI.
*
* @param location the URI of the stored data
*/
public void remove(URI location);
/** Lists the elements at the given location in the storage's file tree.
* Unlike {@link StorageInterface#at}, this method is not recursive and
* can yield intermediate URIs representing other directories rather than
* objects.
*
* Directories can be distinguished from regular files
* by the presence of a trailing forward slash in the URI.
*
* The provided scheme is not relevant at this point, but the developer
* must avoid calling this method with a path of a different scheme.
*
* @param location the base storage location to list
*
* @return a standard stream of URIs representing entries in the given base
* location
* @throws UnsupportedOperationException if this storage does not support
* listing directories
* @throws NoSuchFileException if the given location does not exist in the
* storage
* @throws NotDirectoryException if the given location does not refer to a
* listable entry (a directory)
* @throws IOException if some other I/O error occurs
*/
public default Stream<URI> list(URI location) throws IOException { ... }
}
Storing files
The two store
method overloads are used to store new objects. The only difference between
the overloads is that one takes a DicomInputStream
,
whereas the other takes a DicomObject
.
Unless the storage can take advantage of streamed processing, it is usual to read the stream into
an object (using DicomInputStream#readDicomObject
)
and call the other method overload:
public URI store(DicomInputStream inputStream, Object ... parameters) {
DicomObject obj = inputStream.readDicomObject();
return this.store(obj, parameters);
}
A unique identifier for the object is to be created by the storage and returned when successful. The stored object should then become accessible by using the same URI. A traditional file storage might create a hierarchical URI based on the DICOM object’s meta-data (so as to categorize files by modality, study, series, and so on), and serialize it into a file with its path defined by the URI. Therefore, a valid URI would be, for example:
file:/my-storage/MyHospital/CT/2004/06/07/patient1/001.dcm
Remember when we indexed a directory in "Using Dicoogle"?
This is the approach made by the provided file storage plugin, which makes indexing of existing files easier: you can infer the URI of a file just by looking at its path in the system! Although it is useful to have a trivial mapping such as this one, this behaviour is not required for Dicoogle storages in general.
Fetching files
The at
method introduces a new abstraction for files in a Dicoogle storage provider. The StorageInputStream
interface,
despite the name, represents an item in storage (often a DICOM file). An ordinary Java input stream can be obtained
by calling getInputStream()
. The code below would allow you to read a file as DICOM data. Indexers and other plugins
may instead interpret the file as images, or arbitrary binary data, depending on their purpose.
Iterable<StorageInputStream> it = myStorage.at("file:/data/X/000.dcm");
StorageInputStream f = it.next(); // expect one file
DicomObject dcm = new DicomInputStream(f.getInputStream()).readObject();
// use DICOM object
Storage plugins are required to implement the logic behind obtaining a list of files by URI, as well as a class
type to be used as a StorageInputStream
. The following should be kept in mind:
- If the storage is hierarchical, calling
at
for a parent resource must recursively yield all files inside that directory. This also means thatstore.at("file:/")
would give us all files in storage. - As stores may end up having millions of entries, it is recommended to build a lazy iterable of the files upon
a call to
at
. For instance, the iterator may keep a queue of directories yet to be expanded to their respective children. - Never return
null
from this method, and always return a valid iterable object, even if it has to be an empty one.
Other methods
getScheme()
should constantly return a string compatible with a URI scheme. A basic file storage plugin would use the file
scheme like this:
@Override
public String getScheme() {
return "file";
}
Dicoogle will identify whether a particular item in storage (existent or not yet existent)
should be handled by this plugin through the URI scheme.
By default, this check is done through an exact match.
That is, file://CT/I0001.dcm
is handled by the plugin with getScheme()
above,
but file+ssl://CT/I0001.dcm
is not.
If you need to change this behavior, you can override the handles(URI)
method.
@Override
public boolean handles(URI location) {
String scheme = location.getScheme();
return scheme.equals("file") || scheme.equals("file+ssl");
}
get()
was introduced in Dicoogle 3.4,
enabling the retrieval of a single item in storage.
This method has a default implementation based on at()
,
but can be overridden for a more efficient routine.
public default StorageInputStream get(URI location, Object ... parameters) {
File file = new File(location);
if (!file.exists()) {
return null;
}
return new MyStorageInputStream(file);
}
list()
was introduced in Dicoogle 3,
enabling storage interfaces to provide a list of entries
at a particular position in the storage tree.
Note that this is different from the method at()
:
list
is optional and provides a shallow list of files and directories directly below the given directory,
whereas at
is required and provides a full list of all files (leaf items) in storage at the given base directory.
Implementers may ignore this method,
but implemeting it may grant additional features to end applications.
Consumers of this method should catch UnsupportedOperationException
to handle situations in which the method is not implemented.