.. _Process: ######################### Processing ######################### CLARIN:EL offers you, in total, three approaches to processing: one that starts from specific **datasets** which fulfill certain conditions [#]_ (you first select the *dataset* you want to process and then the *function*, i.e, the type of processing you want to apply) and two more which use **function** as the starting point (i.e. you select *what* you want to do, then the *tool/workflow* that suits your needs and then the *dataset* on which you will apply the service) . If you are interested in the function, please see how you can use the :ref:`workflow registry` or the :ref:`processing services`. In these cases you will have to upload your own dataset. If, on the other hand, you are interested in a specific dataset on which you would like to apply one or more of the integrated services, check out the information provided on :ref:`processable corpora`. .. attention:: Processing is available **only to registered users** who are signed in. If you don't have an account, see :ref:`here` how to register. .. _Processable: 1. Starting with the data ****************************** The corpora which have features that make them compatible with the workflows of the infrastructure are indicated as :guilabel:`processable` [#]_. These corpora are offered as a `preselection `_ in the inventory home page. These corpora are either *monolingual* in **Greek**, **English**, **German** or **Portuguese** or *bilingual* having **Greek** as one language and **English**, **German** or **Portuguese** as the other. To process one of these corpora, follow the next **basic steps**: Step 1. Select a corpus ========================= The resource chosen for this scenario is a bilingual corpus: `A parallel subcorpus collected from the European Constitution (EN-EL) (Moses) `_. .. image:: ChosenCorpus.png :width: 1200px First move to the :ref:`lower section` of the view page and choose the **Access** tab. Then click on :guilabel:`Process` [#]_. .. image:: ResourceView14.png :width: 1200px .. _Proceed: Step 2. Select a function ============================ Once you click on :guilabel:`Process`, you will be directed to a selection of workflows from the :ref:`workflow registry`. These are the ones that can be used on the Greek part of the corpus you have chosen (you will also see a notification at the top of the page). Since the corpus is bilingual you will later need to select a workflow for the English part as well. .. image:: WorkflowSelection.png :width: 1200px Click on **use this workflow** (it automatically changes from light blue to green) and then proceed by selecting this service. .. image:: WorkflowSelection2.png :width: 1200px Then repeat the same procedure for the English part. .. image:: WorkflowSelection4.png :width: 1200px A new window appears asking you to review the workflows you have selected before submitting them. .. image:: ReviewWorkflows.png :width: 1200px As soon as you hit the *Submit* button a message will appear, informing you that you will be notified by email when the processing is over. .. image:: Notification.png :width: 1200px Step 3. Get the processed files ================================ You will be notified by email once the processing is finished. To see the results go to your dashboard and check the :ref:`Processing tasks`. .. attention:: A metadata record with the annotated data is **automatically** created and the resource is published to the inventory. .. _Function: 2. Starting with the Function ********************************* .. _Workflow: 2.1 Workflow Registry ======================= You can access the workflow registry either from the :ref:`inventory home page` or from your :ref:`dashboard`. .. image:: WorkflowRegistryNew.png :width: 1200px At the moment there are **nine** functions offered. .. image:: FunctionsNew.png :width: 1200px For each function `CLARIN:EL `_ offers a number of workflows, as shown in the image below. .. image:: SentenceSplittingWorkflows.png :width: 1200px The **basic steps** to use the workflow registry are the following: Step 1. Select a function --------------------------- Select a function, according to which type of processing you want to perform, by clicking on its name. The selected function (e.g. tokenization) changes colour from blue to orange. Step 2. Select a workflow --------------------------- For tokenization there are multiple available workflows: various for Greek corpora, one for English, one for German and one for Portuguese. Select the workflow you want by clicking on :guilabel:`Use this workflow`. .. image:: Tokenization.png :width: 1200px Step 3. Upload your data --------------------------- In the new window, you are informed about the prerequisites of the processing, i.e. the specifications of the dataset to be uploaded. If you wish to process your own dataset, it needs to fulfil these conditions; then, you can upload it. .. attention:: You can **only** upload *monolingual corpora* in **Greek**, **English**, **German** or **Portuguese**. The workflows can also process the infrastructure bilingual corpora which are tagged as :ref:`processable`. .. image:: TokenizationUpload.png :width: 1200px After the dataset has been successfully uploaded, the **next** button is activated and you can click on it. .. image:: TokenizationStarted.png :width: 1200px Step 4. Get the processed files -------------------------------- You will be notified by email once the processing is finished. To see the results go to your dashboard and check the :ref:`Processing tasks`. .. attention:: Both the data uploaded for processing and the data which result from the processing are **not stored** permanently in the infrastructure; the CLARIN:EL policy is to delete the annotated data 48 hours after processing has been completed. If you wish to download them, please, do so during this time frame. .. _ToolService: 2.2 Processing Services ========================= Go to the central inventory and apply the processing service filter. You will be presented with all the available services in the infrastructure. To use them, follow the next **basic steps**: .. image:: ProcessingService.png :width: 1200px Step 1: Select a service ---------------------------- Click on the name of the service you would like to use. You will be transferred to the resource view page. Move to the lower section of the page and choose the **Access** tab. .. image:: UseService.png :width: 1200px .. _ServiceUse: Click on the :guilabel:`Use` button. In the next window, you will be presented with the workflow created for the service you chose. You must click on **Use this workflow**. .. image:: UseWorkflow.png :width: 1200px Step 2. Upload your data --------------------------- In the new window, you are informed about the prerequisites of the processing, i.e. the specifications of the dataset to be uploaded. If you wish to process your own dataset, it needs to fulfil these conditions; then, you can upload it. .. image:: UseWorkflow3.png :width: 1200px After the dataset has been successfully uploaded, the **next** button is activated and you can click on it. .. image:: TokenizationStarted.png :width: 1200px Step 3. Get the processed files -------------------------------- You will be notified by email once the processing is finished. To see the results go to your dashboard and check the :ref:`Processing tasks`. .. attention:: Both the data uploaded for processing and the data which result from the processing are **not stored** permanently in the infrastructure; the CLARIN:EL policy is to delete the annotated data 48 hours after processing has been completed. If you wish to download them, please, do so during this time frame. .. [#] All the corpora which meet these criteria are indicated as :guilabel:`processable`. They are presented in the inventory home page as a `preselection `_ which directs to the central inventory. .. [#] The tag is also found at the resource snippet in the :ref:`central inventory` and the resource :ref:`view page`. .. [#] If you are not :ref:`signed in`, the button prompts you to do so (:guilabel:`Sign in to process`). After signing in, you are redirected to the resource view page where the :guilabel:`Process` button appears.