Skip to content

Commit 20e79cd

Browse files
author
dterefe
committed
Merge remote-tracking branch 'origin/main'
2 parents 3bc6809 + 9b54c45 commit 20e79cd

17 files changed

+286
-189
lines changed

README.md

Lines changed: 18 additions & 96 deletions
Original file line numberDiff line numberDiff line change
@@ -1,114 +1,36 @@
1-
# DUUI-Gateway
1+
![DUUIGatewayImage](page/docs/images/DUUI.svg)
2+
<span style="font-size:5em; display:inline;">Gateway</span>
23

3-
![GitHub License](https://img.shields.io/github/license/Texttechnologylab/DUUI-Gateway)
4-
5-
[![Discord-Server](http://img.shields.io/badge/Join-Discord_Server-fc0098.svg)](https://discord.gg/DxsgfbK7Jh)
6-
7-
8-
## Introduction
9-
10-
Automatic analysis of large text corpora is a complex task. This complexity
11-
particularly concerns the question of time efficiency. Furthermore, efficient,
12-
flexible, and extensible textanalysis requires the continuous integration of every
13-
new text analysis tools. Since there are currently, in the area of NLP and
14-
especially in the application context of UIMA, only very few to no adequate
15-
frameworks for these purposes, which are not simultaneously outdated or can no
16-
longer be used for security reasons, this work will present a new approach to fill
17-
this gap.
18-
19-
## Pipeline
20-
21-
A pipeline is a collection of components or Analysis Engines that can be executed.
22-
During an analysis process, the components in the pipeline are executed one after
23-
another annotating documents. Pipelines do not interact with the input data directly
24-
but build the structure for an NLP workflow.
25-
26-
Creating a pipeline with this web-interface can be done in the Builder.
27-
It is a three-step form that guides you through building a pipeline either from scratch or
28-
using a template as the starting point.
29-
30-
>Choosing a template as a starting point copies all predefined settings into a fresh
31-
pipeline.
32-
33-
In the second step pipeline specific properties like name, description, tags and settings can be edited.
34-
Only a name is required to proceed but adding a short description is recommended to serve as documentation
35-
and help others when sharing a pipeline. Tags can help document and find pipelines
36-
in the Dashboard.
37-
38-
## Component
394

40-
Components are the part of DUUI that actually do the processing and therefore offer
41-
the most settings. When creating a pipeline you can choose from a set of predefined
42-
components or create your own. Once added to the pipeline, a component can be edited
43-
by clicking the <img src="./images/fa-edit.svg" width="14"> icon. This will open a drawer on
44-
the right, that allows for modification of a component.
455

46-
Settings include:
47-
48-
**Name**
49-
50-
**Driver** &mdash; The Driver is responsible for the instantiation
51-
of a component during a process.
52-
53-
**Target** &mdash; The component's target depends on the selected
54-
driver. For Docker, Kubernetes and Swarm Drivers, the target is the full image name.
55-
For UIMA it is the class path to the Annotator represented by this component and for
56-
a Remote Driver the URL has to be specified.
57-
58-
**Tags**
59-
60-
**Description**
61-
62-
**Options**
63-
64-
**Parameters**
65-
66-
Options are specific to the selected driver. Most of the time the default options
67-
are sufficient and modifications are only for special uses cases. Parameters are
68-
useful if the component requires settings that are not controlled by DUUI.
69-
70-
>When editing a specific pipeline, clicking the <img src="./images/fa-clone.svg" width="14"> icon
71-
clones the component's settings and prefills the creation form.
6+
![GitHub License](https://img.shields.io/github/license/Texttechnologylab/DUUI-Gateway)
727

73-
## Process
8+
[![Discord-Server](http://img.shields.io/badge/Join-Discord_Server-fc0098.svg)](https://discord.gg/DxsgfbK7Jh)
749

75-
A process manages the flow of data and pipeline execution. Starting a process is
76-
possible on a pipeline page. On the process creation screen you are asked to select
77-
an input, output and optionally settings that influence the process behavior.
7810

79-
### Input and Output
11+
### About
12+
The **Docker Unified UIMA Interface – Gateway** (**DUUIGateway** for short) is a web and REST-based software solution for encapsulating and utilising the [Docker Unified UIMA Interface](https://github.com/texttechnologylab/DockerUnifiedUIMAInterface), a Big Data NLP framework for the automatic processing of heterogeneous NLP tools, based on UIMA and using microservices such as Docker or Kubernetes.
8013

81-
Any process must be provided with an input source to be started. Each requires
82-
different properties to be set. The available input sources are:
14+
**DUUI** as well as **DUUIGateway** are developed and maintained at the **Texttechnologylab** ([TTLab](https://www.texttechnologylab.org/)) at the Goethe University Frankfurt.
8315

84-
#### Text
8516

86-
For simple and quick analysis you can choose to process plain text. The text
87-
to be analyzed can be entered in a text area.
17+
## Introduction
8818

89-
#### File
19+
Automatic analysis of large text corpora is a complex task. This complexity particularly concerns the question of time efficiency. Furthermore, efficient, flexible, and extensible textanalysis requires the continuous integration of every new text analysis tools. Since there are currently, in the area of NLP and especially in the application context of UIMA, only very few to no adequate frameworks for these purposes, which are not simultaneously outdated or can no longer be used for security reasons, this work will present a new approach to fill this gap.
9020

91-
Selecting file as the input source allows for the upload of one or multiple
92-
files.
21+
DUUIGateway is a tool that completely encapsulates DUUI and allows its use in a functional web interface as well as by integrating an API.
9322

94-
#### Cloud
9523

96-
There are currently four cloud storage providers available to use: Dropbox and
97-
Min.io (s3), Google Drive, and NextCloud. More will be added in the future. To use your cloud storage
98-
provider of choice, a connection must be established on your Account page.
24+
## Team
9925

100-
>With the exception of text, all input sources require a file extension to be
101-
selected.
26+
- Cederic Borkowski [:fontawesome-brands-github:](https://github.com/CedricBorko)
27+
- Prof. Dr. Alexander Mehler (Leader TTLab) [:fontawesome-brands-github:](https://github.com/amehler) [:fontawesome-brands-researchgate:](https://www.researchgate.net/profile/Alexander-Mehler-2)
28+
- Giuseppe Abrami [:fontawesome-brands-github:](https://github.com/abrami) [:fontawesome-brands-researchgate:](https://www.researchgate.net/profile/Giuseppe-Abrami)
29+
- Dawit Terefe [:fontawesome-brands-github:](https://github.com/dterefe)
10230

103-
### Settings
10431

105-
Settings can be changed for both the input and output. Their main purpose is to
106-
filter the files that are processed. This can be done by setting a minimum file
107-
size or ignoring files that may be at the output location.
32+
## Usage & Support
10833

109-
Process related settings include the option to use multiple workers for parallel
110-
processing or ignoring errors that occur by skipping to next docment instead of
111-
failing the entire pipeline.
34+
To use DUUIGateway, you only need Docker or podman to run a Compose setup. After successful setup, extensive documentation is available in DUUIGateway (cf. [Documentation](https://duui.texttechnologylab.org/documentation)).
11235

113-
Note that the amount of workers or threads that can be used is limited by the
114-
system!
36+
For support, please contact our [team](#team) or use our dedicated [![Discord-Server](http://img.shields.io/badge/Discord-Server-fc0098.svg)](https://discord.gg/DxsgfbK7Jh)

0 commit comments

Comments
 (0)