The CLARIN:EL infrastructure and metadata schema support the FAIR principles : Findability, Accessibility, Interoperability and Reuse of digital assets. This section provides an overview of the FAIR principles. For detailed information, please, visit the GoFair website where each one of the principles is further subdivided and analysed.
First data should be found. One of the ways to trace a digital object regardless of changes to its location on the internet is the persistent identifier (PID). A PID is a string uniquely identifying a digital object. In CLARIN:EL each resource is given a PID upon being published and can be retrieved with it even if the resource has been removed from the central inventory. For example, the Sentiment Analysis Tool has been unpublished but by using its PID (http://hdl.handle.net/11500/DEMOKRITOS-0000-0000-24A2-0) the user is directed to the resource view page where a tombstone indicates that this resource is temporarily unavailable.
Another way to find data is via their metadata descriptions. The more (and accurate) metadata provided, the merrier. On the GoFair website the importance of metadata is pointed out with a simple rule of thumb: “you should never say ‘this metadata isn’t useful’; be generous and provide it anyway!”
Once the data are found, the user should know how to access them: “anyone with a computer and an internet connection can access at least the metadata” 1. Accessibility in this context is the ability to retrieve data and metadata without specialised or proprietary tools or communication methods. However, accessibility is not condition free. Authentication and/or authorisation could be required where necessary. In CLARIN:EL authentication and authorisation are required when a user wants access to specific rights (as a curator, validator or supervisor) or when one wants access to processing services. In these cases the user has to register/sign in first; browsing, viewing and exporting metadata records as well as downloading resources are available to non registered users.
Accessibility is also assured when metadata are available even when the data are not. Besides the tombstone mentioned for resources which have been unpublished, CLARIN:EL has also for info resources either because data are under process and not ready to be published or because legal clearance is pending. These metadata records still provide all the necessary information about the upcoming data and offer contact details.
Interoperability concerns both data and metadata and their perception from humans and computers. In simple words, exchange and interpretation of data should be a seamless effort between humans or machines. To allow for readibility without the need for additional software (algorithms, translators, mappings) commonly accepted “controlled vocabularies, ontologies, thesauri and a good data model (a well-defined framework to describe and structure (meta)data) should be used” 2.
Towards this end, qualified references between resources (cross-references which explicitely state the connection of resources) are also needed. In CLARIN:EL the metadata schema has forseen for such links, an example of which is presented in the image below. The OROSSIMO Corpus - Economics is, as indicated in the relations part of the resource view page, part of the OROSSIMO Corpus and has outcome the Orossimo Terminological Resource - Economics.
In addition, all resources are cited with their PID included.
In order to be able to reuse data, the metadata used should richly describe all aspects related to the data generation. The term plurality is used “to indicate that the metadata author should be as generous as possible in providing metadata, even including information that may seem irrelevant” 3.
For the same reason, the licensing status of data should be clear. The CLARIN:EL metadata schema has multiple metadata elements referring to all legal aspects (licence terms, URL, conditions of use etc.) of a resource. These appear as fields in the metadata editor 4 as shown in the image below: