Skip to content

Commit 8a7b677

Browse files
committed
Merge branch 'master' into dev-frontend
2 parents a8dabb8 + da8534d commit 8a7b677

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+1750
-360
lines changed
File renamed without changes.

.github/ISSUE_TEMPLATE/01-question.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
---
2+
name: "\U0001F4AC Question"
3+
about: For the question to confirm suspiciously behaviors or feature usage. Please use StackOverflow if your question is general usage or help with your environment
4+
5+
---
6+
7+
How to reproduce the behaviour
8+
---------
9+
<!-- Before submitting an issue, make sure to check the docs and closed issues and FAQ to see if any of the solutions work for you. https://github.com/chakki-works/doccano/wiki/Frequently-Asked-Questions -->
10+
11+
<!--
12+
Include a code example or the steps that led to the problem. Please try to be as specific as possible. -->
13+
14+
Your Environment
15+
---------
16+
<!-- Include details of your environment. -->
17+
18+
* Operating System:
19+
* Python Version Used:
20+
* When you install doccano:
21+
* How did you install doccano (Heroku button etc):

.github/ISSUE_TEMPLATE/02-bug.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
name: "\U0001F6A8 Bug Report"
3+
about: For the bug report or unexpected behavior differing from the docs
4+
5+
---
6+
7+
How to reproduce the behaviour
8+
---------
9+
<!-- Before submitting an issue, make sure to check the docs and closed issues and FAQ to see if any of the solutions work for you. https://github.com/chakki-works/doccano/wiki/Frequently-Asked-Questions -->
10+
11+
<!-- Include a code example or the steps that led to the problem. Please try to be as specific as possible. -->
12+
13+
Your Environment
14+
---------
15+
<!-- Include details of your environment.-->
16+
* Operating System:
17+
* Python Version Used:
18+
* When you install doccano:
19+
* How did you install doccano (Heroku button etc):

.github/ISSUE_TEMPLATE/03-install.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
name: "\U000023F3 Installation Problem"
3+
about: For the problem that you faced when installing doccano which none of the suggestions in the docs and other issues helped
4+
5+
---
6+
7+
<!-- Before submitting an issue, make sure to check the docs and closed issues and FAQ to see if any of the solutions work for you. https://github.com/chakki-works/doccano/wiki/Frequently-Asked-Questions -->
8+
9+
How to reproduce the problem
10+
---------
11+
<!-- Include the details of how the problem occurred. Which option did you choose to install doccano? Did you come across an error? What else did you try? -->
12+
13+
```bash
14+
# copy-paste the error message here
15+
```
16+
17+
Your Environment
18+
---------
19+
<!-- Include details of your environment.-->
20+
* Operating System:
21+
* Python Version Used:
22+
* When you install doccano:
23+
* How did you install doccano (Heroku button etc):

.github/ISSUE_TEMPLATE/04-request.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
name: "\U0001F381 Feature Request"
3+
about: For the proposal to improve or enhance doccano
4+
5+
---
6+
7+
Feature description
8+
---------
9+
<!-- Please describe the feature: Which area of the library is it related to? What specific solution would you like? -->

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,4 +199,4 @@ node_modules/
199199
bundle/
200200
webpack-stats.json
201201

202-
.vscode/
202+
.vscode

.travis.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,16 +41,19 @@ deploy:
4141
- provider: script
4242
script: tools/cd.sh travis-${TRAVIS_BUILD_NUMBER}
4343
on:
44+
repo: chakki-works/doccano
4445
branch: master
4546

4647
- provider: script
4748
script: tools/cd.sh ${TRAVIS_TAG}
4849
on:
50+
repo: chakki-works/doccano
4951
tags: true
5052

5153
- provider: pages
5254
skip_cleanup: true
5355
github_token: $GITHUB_TOKEN
5456
local_dir: site
5557
on:
58+
repo: chakki-works/doccano
5659
branch: master

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ FROM python:${PYTHON_VERSION}-stretch AS builder
44
ARG NODE_VERSION="8.x"
55
RUN curl -sL "https://deb.nodesource.com/setup_${NODE_VERSION}" | bash - \
66
&& apt-get install --no-install-recommends -y \
7-
nodejs=8.16.0-1nodesource1
7+
nodejs
88

99
COPY tools/install-mssql.sh /doccano/tools/install-mssql.sh
1010
RUN /doccano/tools/install-mssql.sh --dev

README.md

Lines changed: 68 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -3,27 +3,27 @@
33
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/98a0992c0a254d0ba23fd75631fe2907)](https://app.codacy.com/app/Hironsan/doccano?utm_source=github.com&utm_medium=referral&utm_content=chakki-works/doccano&utm_campaign=Badge_Grade_Dashboard)
44
[![Build Status](https://travis-ci.org/chakki-works/doccano.svg?branch=master)](https://travis-ci.org/chakki-works/doccano)
55

6-
doccano is an open source text annotation tool for human. It provides annotation features for text classification, sequence labeling and sequence to sequence. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create project, upload data and start annotation. You can build dataset in hours.
6+
doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequence to sequence tasks. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create a project, upload data and start annotating. You can build a dataset in hours.
77

88
## Demo
99

10-
You can enjoy [annotation demo](http://doccano.herokuapp.com).
10+
You can try the [annotation demo](http://doccano.herokuapp.com).
1111

1212
### [Named entity recognition](https://doccano.herokuapp.com/demo/named-entity-recognition/)
1313

14-
First demo is one of the sequence labeling tasks, named-entity recognition. You just select text spans and annotate it. Since doccano supports shortcut key, so you can quickly annotate text spans.
14+
The first demo is a sequence labeling task: named-entity recognition. You just select text spans and annotate them. Doccano supports shortcut keys, so you can quickly annotate text spans.
1515

1616
![Named Entity Recognition](./docs/named_entity_annotation.gif)
1717

1818
### [Sentiment analysis](https://doccano.herokuapp.com/demo/text-classification/)
1919

20-
Second demo is one of the text classification tasks, topic classification. Since there may be more than one category, you can annotate multi-labels.
20+
The second demo is a text classification task: sentiment analysis. Since there may be more than one category, you can annotate with multiple labels.
2121

2222
![Text Classification](./docs/text_classification.gif)
2323

2424
### [Machine translation](https://doccano.herokuapp.com/demo/translation/)
2525

26-
Final demo is one of the sequence to sequence tasks, machine translation. Since there may be more than one responses in sequence to sequence tasks, you can create multi responses.
26+
The final demo is a sequence to sequence task: machine translation. Since there may be more than one response in sequence to sequence tasks, you can create multiple responses.
2727

2828
![Machine Translation](./docs/translation.gif)
2929

@@ -54,43 +54,52 @@ git push heroku master
5454

5555
Doccano can be deployed to AWS ([Cloudformation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html)) by clicking on the button below:
5656

57-
[![AWS CloudFormation Launch Stack SVG Button](https://cdn.rawgit.com/buildkite/cloudformation-launch-stack-button-svg/master/launch-stack.svg)](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?templateURL=https://s3-external-1.amazonaws.com/cf-templates-10vry9l3mp71r-us-east-1/20190732wl-new.templatexloywxxyimi&stackName=doccano)
57+
[![AWS CloudFormation Launch Stack SVG Button](https://cdn.rawgit.com/buildkite/cloudformation-launch-stack-button-svg/master/launch-stack.svg)](https://console.aws.amazon.com/cloudformation/home?#/stacks/create/review?stackName=doccano&templateURL=https://s3-external-1.amazonaws.com/cf-templates-10vry9l3mp71r-us-east-1/2019290i9t-AppSGl1poo4j8qpq)
5858

5959
> Notice: (1) EC2 KeyPair cannot be created automatically, so make sure you have an existing EC2 KeyPair in one region. Or [create one yourself](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-create-your-key-pair). (2) If you want to access doccano via HTTPS in AWS, here is an [instruction](https://github.com/chakki-works/doccano/wiki/HTTPS-setting-for-doccano-in-AWS).
6060
61-
6261
## Features
6362

64-
* Collaborative annotation
65-
* Multi-Language support
66-
* Emoji :smile: support
67-
* (future) Auto labeling
63+
- Collaborative annotation
64+
- Multi-Language support
65+
- Emoji :smile: support
66+
- (future) Auto labeling
6867

6968
## Requirements
7069

71-
* Python 3.6+
72-
* Django 2.1.7+
73-
* Node.js 8.0+
74-
* Google Chrome(highly recommended)
70+
- Python 3.6+
71+
- Django 2.1.7+
72+
- Node.js 8.0+
73+
- Google Chrome (highly recommended)
7574

7675
## Installation
7776

77+
### Clone repository
78+
7879
First of all, you have to clone the repository:
7980

8081
```bash
8182
git clone https://github.com/chakki-works/doccano.git
8283
cd doccano
8384
```
8485

86+
_Note for Windows developers: Be sure to configure git to correctly handle line endings or you may encounter `status code 127` errors while running the services in future steps. Running with the git config options below will ensure your git directory correctly handles line endings._
87+
88+
```bash
89+
git clone https://github.com/chakki-works/doccano.git --config core.autocrlf=input
90+
```
91+
92+
### Install doccano
93+
8594
To install doccano, there are three options:
8695

87-
**Option1: Pull the production Docker image**
96+
#### Option 1: Pull the production Docker image
8897

8998
```bash
9099
docker pull chakkiworks/doccano
91100
```
92101

93-
**Option2: Setup Python environment**
102+
#### Option 2: Setup Python environment
94103

95104
First we need to install the dependencies. Run the following commands:
96105

@@ -111,7 +120,7 @@ npm run build
111120
cd ..
112121
```
113122

114-
**Option3: Pull the development Docker-Compose images**
123+
#### Option 3: Pull the development Docker-Compose images
115124

116125
```bash
117126
docker-compose pull
@@ -123,7 +132,7 @@ docker-compose pull
123132

124133
Let’s start the development server and explore it.
125134

126-
Depending on your installation method, there are two options:
135+
Depending on your installation method, there are three options:
127136

128137
#### Option 1: Running the Docker image as a Container
129138

@@ -151,6 +160,12 @@ Next we need to create a user who can login to the admin site. Run the following
151160
python manage.py create_admin --noinput --username "admin" --email "[email protected]" --password "password"
152161
```
153162

163+
Create the admin, annotator, and annotation approver roles to assign to users. Run the following command:
164+
165+
```bash
166+
python manage.py create_roles
167+
```
168+
154169
Developers can also validate that the project works as expected by running the tests:
155170

156171
```bash
@@ -162,7 +177,9 @@ Finally, to start the server, run the following command:
162177
```bash
163178
python manage.py runserver
164179
```
180+
165181
Optionally, you can change the bind ip and port using the command
182+
166183
```bash
167184
python manage.py runserver <ip>:<port>
168185
```
@@ -175,50 +192,63 @@ We can use docker-compose to set up the webpack server, django server, database,
175192
docker-compose up
176193
```
177194

178-
Now, open a Web browser and go to <http://127.0.0.1:8000/login/>. You should see the login screen:
195+
_Note the superuser account credentials located in the `docker-compose.yaml` file:_
196+
```yml
197+
ADMIN_USERNAME: "admin"
198+
ADMIN_PASSWORD: "password"
199+
```
200+
201+
### Confirm all doccano services are running
202+
Open a Web browser and go to <http://127.0.0.1:8000/login/>. You should see the login screen:
179203
180204
<img src="./docs/login_form.png" alt="Login Form" width=400>
181205
182206
### Create a project
183207
184208
Now, try logging in with the superuser account you created in the previous step. You should see the doccano project list page:
185209
186-
<img src="./docs/projects.png" alt="projects" width=600>
210+
<img src="./docs/projects.png" alt="Projects page" width=600>
187211
188212
There is no project created yet. To create your project, make sure you’re in the project list page and select `Create Project` button. You should see the following screen:
189213

190214
<img src="./docs/create_project.png" alt="Project Creation" width=400>
191215

192-
In this step, you can select three project types: text classificatioin, sequence labeling and sequence to sequence. You should select a type with your purpose.
216+
In this step, you can select three project types: text classification, sequence labeling and sequence to sequence. You should select a type with your purpose.
193217

194218
### Import Data
195219

196220
After creating a project, you will see the "Import Data" page, or click `Import Data` button in the navigation bar. You should see the following screen:
197221

198222
<img src="./docs/upload.png" alt="Upload project" width=600>
199223

200-
You can upload two types of files:
201-
- `CSV file`: file must contain a header with a `text` column or be one-column csv file.
202-
- `JSON file`: each line contains a JSON object with a `text` key. JSON format supports line breaks rendering.
224+
You can upload the following types of files (depending on project type):
225+
226+
- `Text file`: file must contain one sentence/document per line separated by new lines.
227+
- `CSV file`: file must contain a header with `"text"` as the first column or be one-column csv file. If using labels the second column must be the labels.
228+
- `Excel file`: file must contain a header with `"text"` as the first column or be one-column excel file. If using labels the second column must be the labels. Supports multiple sheets as long as format is the same.
229+
- `JSON file`: each line contains a JSON object with a `text` key. JSON format supports line breaks rendering.
203230

204231
> Notice: Doccano won't render line breaks in annotation page for sequence labeling task due to the indent problem, but the exported JSON file still contains line breaks.
205232

206-
`example.txt` (or `example.csv`)
207-
```python
233+
`example.txt/csv/xlsx`
234+
235+
```txt
208236
EU rejects German call to boycott British lamb.
209237
President Obama is speaking at the White House.
210238
He lives in Newark, Ohio.
211239
...
212240
```
241+
213242
`example.json`
243+
214244
```JSON
215245
{"text": "EU rejects German call to boycott British lamb."}
216246
{"text": "President Obama is speaking at the White House."}
217247
{"text": "He lives in Newark, Ohio."}
218248
...
219249
```
220250

221-
Any other columns (for csv) or keys (for json) are preserved and will be exported in the `metadata` column or key as is.
251+
Any other columns (for csv/excel) or keys (for json) are preserved and will be exported in the `metadata` column or key as is.
222252

223253
Once you select a TXT/JSON file on your computer, click `Upload dataset` button. After uploading the dataset file, we will see the `Dataset` page (or click `Dataset` button list in the left bar). This page displays all the documents we uploaded in one project.
224254

@@ -228,18 +258,23 @@ Click `Labels` button in left bar to define your own labels. You should see the
228258

229259
<img src="./docs/label_editor.png" alt="Edit label" width=600>
230260

261+
### Assign Roles to Users
262+
263+
Click `Users` button in left bar to assign project users to annotator, admin, or annotation approval roles.
264+
265+
<img src="./docs/user_page.png" alt="Assign users to roles on project" width=600>
231266

232267
### Annotation
233268

234269
Now, you are ready to annotate the texts. Just click the `Annotate Data` button in the navigation bar, you can start to annotate the documents you uploaded.
235270

236-
<img src="./docs/annotation.png" alt="Edit label" width=600>
271+
<img src="./docs/annotation.png" alt="Annotate data" width=600>
237272

238273
### Export Data
239274

240275
After the annotation step, you can download the annotated data. Click the `Edit data` button in navigation bar, and then click `Export Data`. You should see below screen:
241276

242-
<img src="./docs/export_data.png" alt="Edit label" width=600>
277+
<img src="./docs/export_data.png" alt="Export data" width=600>
243278

244279
You can export data as CSV file or JSON file by clicking the button. As for the export file format, you can check it here: [Export File Formats](https://github.com/chakki-works/doccano/wiki/Export-File-Formats).
245280

@@ -249,11 +284,14 @@ by adding `external_id` to the imported file. For example:
249284

250285
Input file may look like this:
251286
`import.json`
287+
252288
```JSON
253289
{"text": "EU rejects German call to boycott British lamb.", "meta": {"external_id": 1}}
254290
```
291+
255292
and the exported file will look like this:
256293
`output.json`
294+
257295
```JSON
258296
{"doc_id": 2023, "text": "EU rejects German call to boycott British lamb.", "labels": ["news"], "username": "root", "meta": {"external_id": 1}}
259297
```
@@ -270,7 +308,6 @@ As with any software, doccano is under continuous development. If you have reque
270308

271309
Here are some tips might be helpful. [How to Contribute to Doccano Project](https://github.com/chakki-works/doccano/wiki/How-to-Contribute-to-Doccano-Project)
272310

273-
274311
## Contact
275312

276313
For help and feedback, please feel free to contact [the author](https://github.com/Hironsan).

0 commit comments

Comments
 (0)