You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
about: For the question to confirm suspiciously behaviors or feature usage. Please use StackOverflow if your question is general usage or help with your environment
4
+
5
+
---
6
+
7
+
How to reproduce the behaviour
8
+
---------
9
+
<!-- Before submitting an issue, make sure to check the docs and closed issues and FAQ to see if any of the solutions work for you. https://github.com/chakki-works/doccano/wiki/Frequently-Asked-Questions -->
10
+
11
+
<!--
12
+
Include a code example or the steps that led to the problem. Please try to be as specific as possible. -->
13
+
14
+
Your Environment
15
+
---------
16
+
<!-- Include details of your environment. -->
17
+
18
+
* Operating System:
19
+
* Python Version Used:
20
+
* When you install doccano:
21
+
* How did you install doccano (Heroku button etc):
about: For the bug report or unexpected behavior differing from the docs
4
+
5
+
---
6
+
7
+
How to reproduce the behaviour
8
+
---------
9
+
<!-- Before submitting an issue, make sure to check the docs and closed issues and FAQ to see if any of the solutions work for you. https://github.com/chakki-works/doccano/wiki/Frequently-Asked-Questions -->
10
+
11
+
<!-- Include a code example or the steps that led to the problem. Please try to be as specific as possible. -->
12
+
13
+
Your Environment
14
+
---------
15
+
<!-- Include details of your environment.-->
16
+
* Operating System:
17
+
* Python Version Used:
18
+
* When you install doccano:
19
+
* How did you install doccano (Heroku button etc):
about: For the problem that you faced when installing doccano which none of the suggestions in the docs and other issues helped
4
+
5
+
---
6
+
7
+
<!-- Before submitting an issue, make sure to check the docs and closed issues and FAQ to see if any of the solutions work for you. https://github.com/chakki-works/doccano/wiki/Frequently-Asked-Questions -->
8
+
9
+
How to reproduce the problem
10
+
---------
11
+
<!-- Include the details of how the problem occurred. Which option did you choose to install doccano? Did you come across an error? What else did you try? -->
12
+
13
+
```bash
14
+
# copy-paste the error message here
15
+
```
16
+
17
+
Your Environment
18
+
---------
19
+
<!-- Include details of your environment.-->
20
+
* Operating System:
21
+
* Python Version Used:
22
+
* When you install doccano:
23
+
* How did you install doccano (Heroku button etc):
doccano is an open source text annotation tool for human. It provides annotation features for text classification, sequence labeling and sequence to sequence. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create project, upload data and start annotation. You can build dataset in hours.
6
+
doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequence to sequence tasks. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create a project, upload data and start annotating. You can build a dataset in hours.
7
7
8
8
## Demo
9
9
10
-
You can enjoy[annotation demo](http://doccano.herokuapp.com).
10
+
You can try the[annotation demo](http://doccano.herokuapp.com).
First demo is one of the sequence labeling tasks, named-entity recognition. You just select text spans and annotate it. Since doccano supports shortcut key, so you can quickly annotate text spans.
14
+
The first demo is a sequence labeling task: named-entity recognition. You just select text spans and annotate them. Doccano supports shortcut keys, so you can quickly annotate text spans.
Final demo is one of the sequence to sequence tasks, machine translation. Since there may be more than one responses in sequence to sequence tasks, you can create multi responses.
26
+
The final demo is a sequence to sequence task: machine translation. Since there may be more than one response in sequence to sequence tasks, you can create multiple responses.
27
27
28
28

29
29
@@ -54,43 +54,52 @@ git push heroku master
54
54
55
55
Doccano can be deployed to AWS ([Cloudformation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html)) by clicking on the button below:
> Notice: (1) EC2 KeyPair cannot be created automatically, so make sure you have an existing EC2 KeyPair in one region. Or [create one yourself](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-create-your-key-pair). (2) If you want to access doccano via HTTPS in AWS, here is an [instruction](https://github.com/chakki-works/doccano/wiki/HTTPS-setting-for-doccano-in-AWS).
_Note for Windows developers: Be sure to configure git to correctly handle line endings or you may encounter `status code 127` errors while running the services in future steps. Running with the git config options below will ensure your git directory correctly handles line endings._
There is no project created yet. To create your project, make sure you’re in the project list page and select `Create Project` button. You should see the following screen:
In this step, you can select three project types: text classificatioin, sequence labeling and sequence to sequence. You should select a type with your purpose.
216
+
In this step, you can select three project types: text classification, sequence labeling and sequence to sequence. You should select a type with your purpose.
193
217
194
218
### Import Data
195
219
196
220
After creating a project, you will see the "Import Data" page, or click `Import Data` button in the navigation bar. You should see the following screen:
-`CSV file`: file must contain a header with a `text` column or be one-column csv file.
202
-
-`JSON file`: each line contains a JSON object with a `text` key. JSON format supports line breaks rendering.
224
+
You can upload the following types of files (depending on project type):
225
+
226
+
- `Text file`: file must contain one sentence/document per line separated by new lines.
227
+
- `CSV file`: file must contain a header with `"text"` as the first column or be one-column csv file. If using labels the second column must be the labels.
228
+
- `Excel file`: file must contain a header with `"text"` as the first column or be one-column excel file. If using labels the second column must be the labels. Supports multiple sheets as long as format is the same.
229
+
- `JSON file`: each line contains a JSON object with a `text` key. JSON format supports line breaks rendering.
203
230
204
231
> Notice: Doccano won't render line breaks in annotation page for sequence labeling task due to the indent problem, but the exported JSON file still contains line breaks.
205
232
206
-
`example.txt` (or `example.csv`)
207
-
```python
233
+
`example.txt/csv/xlsx`
234
+
235
+
```txt
208
236
EU rejects German call to boycott British lamb.
209
237
President Obama is speaking at the White House.
210
238
He lives in Newark, Ohio.
211
239
...
212
240
```
241
+
213
242
`example.json`
243
+
214
244
```JSON
215
245
{"text": "EU rejects German call to boycott British lamb."}
216
246
{"text": "President Obama is speaking at the White House."}
217
247
{"text": "He lives in Newark, Ohio."}
218
248
...
219
249
```
220
250
221
-
Any other columns (for csv) or keys (for json) are preserved and will be exported in the `metadata` column or key as is.
251
+
Any other columns (for csv/excel) or keys (for json) are preserved and will be exported in the `metadata` column or key as is.
222
252
223
253
Once you select a TXT/JSON file on your computer, click `Upload dataset` button. After uploading the dataset file, we will see the `Dataset` page (or click `Dataset` button list in the left bar). This page displays all the documents we uploaded in one project.
224
254
@@ -228,18 +258,23 @@ Click `Labels` button in left bar to define your own labels. You should see the
Click `Users` button in left bar to assign project users to annotator, admin, or annotation approval roles.
264
+
265
+
<img src="./docs/user_page.png" alt="Assign users to roles on project" width=600>
231
266
232
267
### Annotation
233
268
234
269
Now, you are ready to annotate the texts. Just click the `Annotate Data` button in the navigation bar, you can start to annotate the documents you uploaded.
After the annotation step, you can download the annotated data. Click the `Edit data` button in navigation bar, and then click `Export Data`. You should see below screen:
You can export data as CSV file or JSON file by clicking the button. As for the export file format, you can check it here: [Export File Formats](https://github.com/chakki-works/doccano/wiki/Export-File-Formats).
245
280
@@ -249,11 +284,14 @@ by adding `external_id` to the imported file. For example:
249
284
250
285
Input file may look like this:
251
286
`import.json`
287
+
252
288
```JSON
253
289
{"text": "EU rejects German call to boycott British lamb.", "meta": {"external_id": 1}}
254
290
```
291
+
255
292
and the exported file will look like this:
256
293
`output.json`
294
+
257
295
```JSON
258
296
{"doc_id": 2023, "text": "EU rejects German call to boycott British lamb.", "labels": ["news"], "username": "root", "meta": {"external_id": 1}}
259
297
```
@@ -270,7 +308,6 @@ As with any software, doccano is under continuous development. If you have reque
270
308
271
309
Here are some tips might be helpful. [How to Contribute to Doccano Project](https://github.com/chakki-works/doccano/wiki/How-to-Contribute-to-Doccano-Project)
272
310
273
-
274
311
## Contact
275
312
276
313
For help and feedback, please feel free to contact [the author](https://github.com/Hironsan).
0 commit comments