.. _Examples: ################################### Examples of metadata ################################### .. raw:: html

In this chapter: metadata excerpts with their explanations, links to XSD and full XML metadata descriptions

The goal of this section is to familiarize users with the use of **metadata**. To do so, resource descriptions have been exported [#]_ from the `CLARIN:EL infrastructure `_ and excerpts of interest have been copied verbatim. Each metadata element is presented per se and then briefly explained. There are also links to the full XML description of the resource for anyone wishing to see the metadata examined in context and the :ref:`XSD `, for a detailed representation of the element. ``resourceName`` ***************** The first metadata element is the ``resourceName`` from the `Greek Parliament Plenary Sessions (1989-2019) `_, a collection of the raw minutes of the Greek Parliament plenary sessions of the last 30 years (more than 1.000.000 speeches). .. admonition:: XML .. code-block:: Greek Parliament Plenary Sessions (1989-2019) Πρακτικά της Ολομέλειας του Ελληνικού Κοινοβουλίου (1989-2019) As shown, it is possible to provide the name in more than one languages; the first language, by default, is english (xml:lang="el") while the second is free of choice. Here the language chosen is greek (xml:lang="en"). * :ref:`See the resource full XML description ` * `See how the element is described in detail in XSD `_ ---------------------------------------------------------------------------------------------------------------------------------- ``resourceCreator`` ******************** The second excerpt is taken from the `KELLY word-list `_, a monolingual lexical conceptual resource. KELLY word-lists were created to facilitate the learning of a foreign/second language. The Greek part was created by the *Institute for Language and Speech Processing* which is an **organization**. .. admonition:: XML .. code-block:: Organization Ινστιτούτο Επεξεργασίας του Λόγου Institute for Language and Speech Processing http://www.ilsp.gr/ The necessary information about the creator is enclosed between the ``resourceCreator`` tags. First, the type of the creator (``actorType``) is defined; a resource could have as creator a *person*, a *group* of people or an *organization*, as is the case for the Kelly world-list. Then the name of the organization is provided (in two languages, xml:lang="el" and xml:lang="en") as well as its website. * :ref:`See the resource full XML description ` * `See how the element is described in detail in XSD `_ ---------------------------------------------------------------------------------------------------------------------------------- ``isPartOf`` ************** The next example is from the `Golden Part of Speech Tagged Corpus `_, a monolingual annotated corpus in Greek with 100.000 words. This corpus is a **subset** of the `Hellenic National Corpus `_ which contains more than 97 million words from a variety of sources and various domains. The subset relationship is expressed through the ``isPartOf`` metadata element in the :ref:`CLARIN:EL metadata schema `. .. admonition:: XML .. code-block:: Ελληνικός Θησαυρός της Ελληνικής Γλώσσας Hellenic National Corpus http://hdl.handle.net/11500/ATHENA-0000-0000-23E2-9 3.0 The ``isPartOf`` element includes the name of the resource (``resourceName``) from which the Golden Part has been derived, i.e. the *Hellenic National Corpus*, expressed in two languages (xml:lang="el" and xml:lang="en") along with its identifier (``LRIdentifier``) and version (``version``). * :ref:`See the resource full XML description ` * `See how the element is described in detail in XSD `_ ---------------------------------------------------------------------------------------------------------------------------------- ``annotationType`` ******************* Alignment is the process that establishes translational equivalences between structural units (words, sentences etc.) of a text in a given language and a text with similar meaning in other language(s). The `Greek-Bulgarian Bul-TM parallel corpus `_ is a *bilingual corpus* and as the adjective *parallel* suggests has been **aligned**. .. admonition:: XML .. code-block:: http://w3id.org/meta-share/omtd-share/Alignment1 http://w3id.org/meta-share/meta-share/sentence false http://w3id.org/meta-share/meta-share/automatic TrAid unspecified Alignment is considered a type of ``annotation``. The two languages have been aligned at *sentence* level (``segmentationLevel``) and there is *not* a separate (``annotationStandoff``) document with each language independently. The procedure has been *automatically* done (``annotationMode``); the tool used for the alignment (``isAnnotatedBy``) is called *TrAid* but no ``version`` is available (*unspecified*). * :ref:`See the resource full XML description ` * `See how the element is described in detail in XSD `_ ---------------------------------------------------------------------------------------------------------------------------------- ``multilingualityType`` ************************ The `DICTA-SIGN corpus `_ is a **multimedia** corpus, consisting of a video part and a text part, for four sign languages (english, french, german and greek). .. admonition:: XML .. code-block:: http://w3id.org/meta-share/meta-share/parallel gss gss bfi bfi gsg gsg fsl fsl Each corpus part is described separately. This excerpt describes the content of the **video part** of the resource. The languages in the video are sign languages and are aligned as indicated by the choice of the value *parallel* for the ``multilingualityType`` element. Then each language (``language``) is presented separately with its language tag (``languageTag``) and id (``languageId``): gss (Greek Sign Language), bfi (British Sign Language), gsg (German Sign Language) and fsl (French Sign Language). * :ref:`See the resource full XML description ` * `See how the element is described in detail in XSD `_ ---------------------------------------------------------------------------------------------------------------------------------- ``isDocumentedBy`` ******************** Sometimes there is extra information about a resource in external documents such as papers and/or conference announcements. Such is the case with `Orossimo Terminological Resource - History `_ which is documented in the *Collection of digital terminological resources: methodology and results*. .. admonition:: XML .. code-block:: Συλλογή ηλεκτρονικών ορολογικών πόρων: μεθοδολογία και αποτελέσματα Collection of digital terminological resources: methodology and results * :ref:`See the resource full XML description ` * `See how the element is described in detail in XSD `_ ---------------------------------------------------------------------------------------------------------------------------------- ``fundingProject`` ******************** The following example is more complex as it includes various metadata elements. It is taken from the `Trilingual Terminological Dictionary `_, a lexical/conceptual resource with a threefold aim: to assist the student in learning the subject areas of the curriculum, to improve their language skills in Greek and to familiarize themselves with information technology. .. admonition:: XML .. code-block:: Τρίγλωσσο Ορολογικό Λεξικό Trilingual Terminological Dictionary https://bit.ly/2V4hWLe https://www.ilsp.gr/projects/tol/ http://w3id.org/meta-share/meta-share/euFunds http://w3id.org/meta-share/meta-share/nationalFunds Organization Ministry of Education and Religious Affairs Organization Ευρωπαϊκή Επιτροπή European Commission https://ec.europa.eu/info/index_en The resource is the result of a project (``fundingProject``) bearing the same name (``projectName``), *Trilingual Terminological Dictionary*. The information provided for the project is the ``websites`` available, the ``fundingType`` and the ``funders``. The project was created with *EU and national funds* while the funders were two organizations, the *Ministry of Education and Religious Affairs* and the *European Commission*. * :ref:`See the resource full XML description ` * `See how the element is described in detail in XSD `_ ---------------------------------------------------------------------------------------------------------------------------------- ``inputContentResource`` *************************** The following XML excerpt provides information on the **input** of `Voyant Tools `_, a web-based text reading and analysis environment. .. admonition:: XML .. code-block:: http://w3id.org/meta-share/meta-share/corpus http://w3id.org/meta-share/meta-share/text http://w3id.org/meta-share/omtd-share/Pdf http://w3id.org/meta-share/omtd-share/Rtf http://w3id.org/meta-share/omtd-share/Xml http://w3id.org/meta-share/omtd-share/ConllU http://w3id.org/meta-share/omtd-share/Html Voyant tools can process, take as input (``inputContentResource``), corpora (``processingResourceType``) of textual data the format (``dataFormat``) of which is *plain text, PDF, RTF, XML, ConllU and HTML*. * :ref:`See the resource full XML description ` * `See how the element is described in detail in XSD `_ ---------------------------------------------------------------------------------------------------------------------------------- ``outputResource`` ******************* The next excerpt presents the output of the `ILSP Language Identification System `_. .. admonition:: XML .. code-block:: http://w3id.org/meta-share/meta-share/corpus el-Latn el Latn Greeklish el-Grek el Grek fr fr en en de de nl nl http://w3id.org/meta-share/meta-share/text This tool performs language identification for *Greeklish, Greek, English, German, Dutch and French*. Greeklish as seen in the excerpt above is a variety (``languageVarietyName``) of the Greek language: the language (``languageId``) is defined as *Greek* (el) but the script (``scriptId``) is *latin* (Latn). * :ref:`See the resource full XML description ` * `See how the element is described in detail in XSD `_ ---------------------------------------------------------------------------------------------------------------------------------- ``attributionText`` ******************* The last example showcases the ``attributionText`` of a language description resource, the `PANACEA Environment Corpus n-grams EL `_. .. admonition:: XML .. code-block:: PANACEA σώμα ελληνικών n-γραμμάτων (n-grams) περιβαλλοντικού τομέα. Δημιουργός: Ινστιτούτο Επεξεργασίας του Λόγου - Ερευνητικό Κέντρο Αθηνά. Άδεια: Creative Commons Attribution Share Alike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/legalcode, https://creativecommons.org/licenses/by-sa/4.0/). Πηγή: http://hdl.handle.net/11500/ATHENA-0000-0000-23DA-3 (CLARIN:EL) PANACEA Environment Corpus n-grams EL (Greek) by Institute for Language and Speech Processing - Athena Research Center used under Creative Commons Attribution Share Alike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/legalcode, https://creativecommons.org/licenses/by-sa/4.0/). Source: http://hdl.handle.net/11500/ATHENA-0000-0000-23DA-3 (CLARIN:EL) The licence of the resource is the *CC-BY-SA 4.0 International*. *"This license lets others remix, adapt, and build upon your work even for commercial purposes, as long as they credit you and license their new creations under the identical terms. This license is often compared to "copyleft" free and open source software licenses. All new works based on yours will carry the same license, so any derivatives will also allow commercial use."* [#]_ The attribution serves this exact purpose as it provides one with text containing the information on the resource creator, the *Institute for Language and Speech Processing - Athena Research Center* and the licence under which the resource and all its derivatives are to be distributed. * :ref:`See the resource full XML description ` * `See how the element is described in detail in XSD `_ .. [#] These tags come in pairs; the opening and ending tags are identical except for the **forward slash**. .. [#] More information on the `Creative Commons website `_.