Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
5532f08
GitBook: [master] 2 pages and 5 assets modified
textandtables Jun 15, 2021
e1970a6
GitBook: [master] one page modified
textandtables Jun 15, 2021
f40c4df
Test change
textandtables Jun 15, 2021
4f63ae9
GitBook: [master] one page modified
textandtables Jun 15, 2021
34d7eb7
Suggested updates
textandtables Jun 18, 2021
9370563
Suggested updates
textandtables Jun 18, 2021
fbeba58
Suggested updates
textandtables Jun 18, 2021
cf34669
Suggested updates
textandtables Jun 18, 2021
26fc87d
Suggested updates
textandtables Jun 18, 2021
11d970d
Suggested updates
textandtables Jun 18, 2021
76e58b0
Suggested updates
textandtables Jun 19, 2021
925b674
Suggested updates
textandtables Jun 19, 2021
0478abf
Suggested updates
textandtables Jun 19, 2021
41871eb
Suggested updates
textandtables Jun 19, 2021
54fb74f
Suggested updates
textandtables Jun 19, 2021
d30c4e4
Suggested updates
textandtables Jun 19, 2021
76cb857
Suggested updates
textandtables Jun 20, 2021
0b1b121
Suggested updates
textandtables Jun 20, 2021
711c202
Suggested updates
textandtables Jun 20, 2021
1a6d4c7
Suggested updates
textandtables Jun 20, 2021
f362a28
Suggested updates
textandtables Jun 21, 2021
b6d5e7a
Suggested updates
textandtables Jun 21, 2021
cd2d87f
Suggested updates
textandtables Jun 21, 2021
b7da144
Suggested updates
textandtables Jun 21, 2021
f1548dc
Suggested updates
textandtables Jun 21, 2021
7442504
Suggested updates
textandtables Jun 21, 2021
e6b1229
Suggested updates
textandtables Jun 22, 2021
56e734a
Suggested updates
textandtables Jun 22, 2021
c08f65a
Suggested updates
textandtables Jun 22, 2021
d8abfff
Suggested updates
textandtables Jun 22, 2021
17bcc60
Suggested updates
textandtables Jun 22, 2021
7c4fdce
Suggested updates
textandtables Jun 22, 2021
51abe65
Suggested updates
textandtables Jun 22, 2021
ed5a665
Suggested updates
textandtables Jun 22, 2021
d5a0b9e
Suggested updates
textandtables Jun 22, 2021
3108550
Suggested updates
textandtables Jun 22, 2021
0680a66
Suggested updates
textandtables Jun 22, 2021
0594a76
Suggested updates
textandtables Jun 22, 2021
9a7376f
Suggested updates
textandtables Jun 22, 2021
57f08d5
Suggested updates
textandtables Jun 22, 2021
d38ae77
Suggested updates
textandtables Jun 22, 2021
dda2d79
Suggested updates
textandtables Jun 22, 2021
ca9261f
Suggested updates
textandtables Jun 22, 2021
3de4056
Suggested updates
textandtables Jun 22, 2021
89887ff
Suggested updates
textandtables Jun 22, 2021
0d1c504
Suggested updates
textandtables Jun 22, 2021
f75fc11
Suggested updates
textandtables Jun 22, 2021
0fe80b8
Suggested updates
textandtables Jun 22, 2021
df1b8c6
Suggested updates
textandtables Jun 22, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed .gitbook/assets/data-cleaning-workflow.png
Binary file not shown.
Binary file removed .gitbook/assets/image (1).png
Binary file not shown.
Binary file removed .gitbook/assets/image.png
Binary file not shown.
Binary file removed .gitbook/assets/remote.png
Binary file not shown.
Binary file removed .gitbook/assets/simple-workflow.png
Binary file not shown.
6 changes: 1 addition & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,2 @@
# Welcome to Documentation

## Introduction

Learn how to integrate with Lucidtech's APIs here.
# Initial page

48 changes: 1 addition & 47 deletions SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,4 @@
# Table of contents

* [Welcome to Documentation](README.md)

## Getting Started

* [Introduction](introduction/README.md)
* [Documents](introduction/documents.md)
* [Predictions](introduction/predictions.md)
* [Transitions and Workflows](introduction/transitions_and_workflows.md)
* [Assets and Secrets](introduction/assets_and_secrets.md)
* [Logs](introduction/logs.md)
* [Models](introduction/models.md)
* [Batches and Consents](introduction/batches_and_consents.md)
* [Quickstart](getting-started/dev/README.md)
* [Using the CLI](getting-started/dev/cli.md)
* [Python](getting-started/dev/python.md)
* [JavaScript](getting-started/dev/js.md)
* [.NET](getting-started/dev/net.md)
* [Java](getting-started/dev/java.md)
<!-- * [Simple demo workflow](getting-started/simple_demo_workflow.md) -->
* [Tutorials](tutorials/README.md)
* [Setup workflow](tutorials/setup_predict_and_approve.md)
* [Setup data cleaning workflow](tutorials/data_cleaning.md)
* [Setup approve view](tutorials/setup_approve_view.md)
* [Setup docker transition](tutorials/create_your_own_docker_transition.md)
* [Docker samples](docker-image-samples/README.md)
* [Authentication](authentication/README.md)
* [Quotas](quotas/README.md)
* [FAQ](getting-started/faq.md)
* [Help](getting-started/help.md)

## Data Training

* [Custom Data Training](data-training/data-training.md)
* [What is Confidence?](data-training/confidence.md)

## Reference

* [Rest API](reference/restapi/README.md)
* [latest](reference/restapi/latest/README.md)
* [Python SDK](reference/python/README.md)
* [latest](reference/python/latest.md)
* [.NET SDK](reference/dotnet/README.md)
* [latest](reference/dotnet/latest.md)
* [JavaScript SDK](reference/js/README.md)
* [latest](reference/js/latest.md)
* [Java SDK](reference/java/README.md)
* [latest](reference/java/latest.md)
* [Initial page](README.md)

50 changes: 26 additions & 24 deletions authentication/README.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,46 @@
## Authenticating to Lucidtech

Lucidtech APIs require you to authenticate using the OAuth2 [protocol](https://tools.ietf.org/html/rfc6749). Our SDKs
will typically handle authentication for you but should you wish to use the REST API, you would need to do this
yourself. Here is a brief introduction to get you started
Lucidtech APIs require you to authenticate using the [Oath2 protocol](https://tools.ietf.org/html/rfc6749). Our SDKs
will typically handle authentication for you, but should you wish to use the REST API (TTNote: Consider a link here), you would need to do this
yourself. Here is a brief introduction to get you started.


#### Credentials
You should already have acquired a client id, client secret and api key before continuing. The client id and client
secret will be used to get an access token from the auth endpoint and the api key will be used together with the
You should already have acquired a *client id*, *client secret* and *api key* before continuing. (TTNote: Consider info here on what to do if you don't
have these items.) The *client id* and *client
secret* will be used to get an access token from the *auth endpoint*, and the *api key* will be used together with the
access token to authorize to the API.

Unless specified otherwise in the credentials file you have received, the endpoint for authentication is
https://auth.lucidtech.ai and the endpoint for the API is https://api.lucidtech.ai
Unless specified otherwise in the credentials file that you have received (TTNote: Is this received upon purchase of the API?
If not consider info on how to obtain the file), the endpoint for authentication is
https://auth.lucidtech.ai and the endpoint for the API is https://api.lucidtech.ai.

#### Getting an access token

To acquire an access token we need to ask the auth endpoint with our client id and client secret for access. This is
done by performing a HTTP POST request to the token endpoint /oauth2/token with two headers provided. One header
should be 'Authorization' with base64 encoded client_id and client secret and one header should be 'Content-Type' which
will always contain the same value 'application/x-www-form-urlencoded'.
To acquire an access token, we need to ask the *auth endpoint* for access using our *client id* and *client secret*. This is
done by performing an HTTP POST request to the token endpoint /oauth2/token with two headers included. One header
should be *Authorization* with the base64-encoded *client_id* and the *client secret*. The other header should be *Content-Type* which
will always contain the same value: 'application/x-www-form-urlencoded'.

| Header name | Header value |
| ----------- | ------------------------------------------- |
| Authorization | Basic Base64Encode(client_id:client_secret) |
| Content-Type | application/x-www-form-urlencoded |

Read more about Base64Encode [here](https://en.wikipedia.org/wiki/Basic_access_authentication#Client_side)
You can read more about Base64Encode [here](https://en.wikipedia.org/wiki/Basic_access_authentication#Client_side).

Since we are dealing with 'client_credentials' we need to specify this in the url as a query parameter. The final URL
to make the request to is https://auth.lucidtech.ai/oauth2/token?grant_type=client_credentials
Since we are working with `client_credentials`, we need to specify this in the url as a query parameter. The final URL
to make the request to is https://auth.lucidtech.ai/oauth2/token?grant_type=client_credentials.

Here is an example getting access token using curl in bash.
Here is an example of obtaining the access token using curl in bash:

```bash
$ credentials="<your client id here>:<your client secret here>"
$ base64_encoded_credentials=`echo -n $credentials | base64 -w 0`
$ curl -X POST https://auth.lucidtech.ai/oauth2/token?grant_type=client_credentials -H "Content-Type: application/x-www-form-urlencoded" -H "Authorization: Basic $base64_encoded_credentials"
```

If everything is working as expected, the response should look similar to the following
If everything is working as expected, the response should look like this:

```json
{
Expand All @@ -49,15 +51,16 @@ If everything is working as expected, the response should look similar to the fo
```

{% hint style="info" %}
The access token will expire after some time, currently after 3600 seconds (1 hour). When the token expires
The access token will expire after some time, currently after 3600 seconds (1 hour). When the token expires,
you will need to get a new access token using the same procedure.
{% endhint %}

#### Calling the API

Upon successfully acquiring access token from previous step, we are ready to call the API! To do that we need to
provide two headers to the API. One header 'x-api-key' with our api key and one header 'Authorization' with the
newly acquired access token.
Upon successfully acquiring the access token from previous step, you are ready to call the API. To do this, you need to
provide two headers to the API. The first header will be *x-api-key* which will contain the api key, and the other header will be *Authorization*
which will contain the
newly acquired access token:

| Header name | Header value |
| ----------- | ------------------------------------- |
Expand All @@ -72,16 +75,15 @@ $ curl https://api.lucidtech.ai/v1/documents -H "x-api-key: $api_key" -H "Author

#### Using an SDK

Our SDKs will handle acquiring access token for you. The only thing you need to do is put the credentials
in a file in the correct location on your computer and the SDK will discover them. The credentials file should
be placed on the following location based on the OS you are running
Our SDKs will acquire the access token for you.
Simply enter the credentials into the credentials.cfg file in the location shown below based on your OS, and the SDK will auto-discover them:

| Operating System | Location |
| ---------------- | ----------------------------------------------------------------------------- |
| Linux/Mac | ~/.lucidtech/credentials.cfg or $HOME/.lucidtech/credentials.cfg |
| Windows | %USERPROFILE%\.lucidtech\credentials.cfg or %HOME%\.lucidtech\credentials.cfg |

The credentials.cfg file should look like the following
The credentials.cfg file should look like the following:

```ini
[default]
Expand Down
6 changes: 3 additions & 3 deletions data-training/confidence.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@ description: What you need to know about confidence

## End-to-end confidence

Every field the model extracts has a corresponding confidence value. The confidence is different from a traditional OCR confidence in that it does not only estimates the probability that the characters are interpreted correctly, but also that it has extracted the correct information (e.g. the total amount and not the VAT amount).
Every field that the model extracts has a corresponding confidence value. The confidence is different from a traditional OCR confidence in that it does not only estimate the probability that the characters are interpreted correctly, but also that it has extracted the correct information (e.g. the total amount and not the VAT amount).

![The figure shows example predictions together with confidence values](confidence1.png)

# End-to-end confidence increases automation

## You can trust that the model is correct when it says so.
## You can trust that the model is correct when it says so

When the confidence of a prediction is above a given threshold, the field can be hidden from the human validator.

This ensures that that only fields that the AI is uncertain about will be manually inspected, while the rest of the fields are fully automated. This means that users will save time and cost by not having to validate high-confidence predictions!
This ensures that only fields that the API is uncertain about will be manually inspected, while the rest of the fields are fully automated. This means that users will save time and cost by not having to validate high-confidence predictions.

![The figure shows how confidence can be used to automate validation of data extraction](confidence2.png)
22 changes: 12 additions & 10 deletions data-training/data-training.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,48 +6,50 @@ description: Getting started with custom data training.

![Data Training](https://lucidtech.ai/assets/img/illustrations/data-training.png)

Lucidtech offers APIs for document data extracting. The core technology is a general machine learning architecture which can be used to interpret a wide range of document types, including invoices, receipts, ID-documents, purchase orders or virtually any other type of document.
Lucidtech offers APIs for document data extracting. The core technology is a general machine learning architecture which can be used to interpret a wide range of document types, including invoices, receipts, ID documents, purchase orders and virtually any other type of document.

To make sure that our API provides optimal accuracy we train our models on your data. We use supervised learning for training our machine learning models. This means that the algorithms learn by observing thousands of examples of documents together with their ground truth. The goal of the training process is that Lucidtech's models learn to produce the correct output for new and previously unseen documents.
To make sure that our API provides optimal accuracy, we train our models on your data. We use supervised learning for training our machine learning models. This means that the algorithms learn by observing thousands of document examples together with their ground truth. The goal of the training process is that Lucidtech's models learn to produce the correct output for new and previously unseen documents.

## 1. Data requirements

### Volume

The amount of data needed to create a high quality model depends on the expected variation of the data as well as the quality of the training data. As a general rule of thumb we require at least 10 000 documents when training a new model, but 30 000+ documents is recommended for an optimal result. When the API is deployed in production, the _feedback endpoints_ should be used to enable continuous training on new data.
The amount of data needed to create a high-quality model depends on the expected variation of the data as well as the quality of the training data. As a general rule of thumb, we require at least 10,000 documents when training a new model, but 30,000+ documents is recommended for an optimal result. When the API is deployed in production, the _feedback endpoints_ should be used to enable continuous training on new data.

### Representative data

The training data should be representative for the expected data. For example, if the expected data consists of invoices from thousands of different vendors, then the training data should not only consist of invoices from five different vendors.
The training data should be representative for the expected data. For example, if the expected data consists of invoices from thousands of different vendors, then the training data should not consist of invoices from only five different vendors.

{% hint style="success" %}
A good way to select representative training data can be to choose data randomly from your database or document archive.
A good way to select representative training data is to choose data randomly from your database or document archive.
{% endhint %}

### Correctness of data

Incorrect or missing ground truth information can be detrimental to the training process. For this reason it is important that the training data is as accurate as possible.
Incorrect or missing ground truth information can be detrimental to the training process. For this reason, it is important that the training data be as accurate as possible.

### Consistency

Ground truth data should adhere to a common format. For example, when extracting dates, all ground truth dates should be listed on the same date format regardless of how the date appears in the document. Examples of inconsistencies:
Ground truth data should adhere to a common format. For example, when extracting dates, all ground truth dates should be listed on the same date format, regardless of how the date appears in the document. Examples of inconsistencies:

* The same date is written as 17.05.18 in one ground truth file and as 17th of May, 2018 in another.
* Different conventions are used to denote amounts, e.g. 1200.00, 1,200.00 and 1200.

{% hint style="info" %}
Consistency is only required in the ground truth data. The corresponding information as written on the actual documents in the data set may be on arbitrary formats.
Consistency is only required in the ground truth data. The corresponding information as written on the actual documents in the data set may use arbitrary formats.
{% endhint %}

## 2. Data preparation

### Deciding what to extract

The first step is to decide which data fields you want to extract from your documents. For an invoice this can be total amount, due date and bank account, or it can also be only total amount. For an ID document it can be first name, last name, id-number and nationality. For a travel ticket it can be price, departure date, arrival date, seat number and mean of transportation. Which data fields you want to extract is up to you to decide. We generally recommend to keep it as simple as possible. In particular, avoid adding fields that you will not use, and make sure that the majority of the data you provide contain the fields you specify.
The first step is to decide which data fields you want to extract from your documents. For an invoice, this can be total amount, due date and bank account, or it can also be only total amount. For an ID document, it can be first name, last name, id number and nationality. For a travel ticket, it can be price, departure date, arrival date, seat number and means of transportation.

Which data fields you want to extract is up to you to decide. We generally recommend keeping it as simple as possible. In particular, avoid adding fields that you will not use, and make sure that the majority of the data that you provide contain the fields you specify.

### Every document needs a ground truth

To start training your custom model you need pairs of documents and their corresponding ground truth. The ground truth is the information you want to extract from the document. Note that every single document needs its own ground truth file.
To start training your custom model, you need pairs of documents (TTNote: Unclear if you mean 'pairs of documents' or simply 'documents paired with their corresponding ground truth) and their corresponding ground truth. The ground truth is the information that you want to extract from the document. Note that every single document needs its own ground truth file.

![Ground Truth](https://lucidtech.ai/assets/img/illustrations/illustration-10.png)

Expand Down
26 changes: 15 additions & 11 deletions docker-image-samples/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Docker Image Samples
Docker images are the essence of an automatic transition;
they are the building blocks of a workflow.
Docker images are the essence of an automatic transition.
They are the building blocks of a workflow.


## Introduction
Expand All @@ -9,18 +9,22 @@ or as a starting point for a customized step in a workflow.


## Getting started
To make a workflow that consist of the samples in this folder
there is no need to dwell here. Just check out the
[tutorials](https://github.com/LucidtechAI/las-docs/tree/master/tutorials/README.md)
To make a workflow that consists of the samples in this folder,
there is no need to dwell here, just check out our
[tutorials](https://github.com/LucidtechAI/las-docs/tree/master/tutorials/README.md).
(TTNote: Consider if the above link should point to gitbook not github.)

## Sample images
* make-predictions: Get predictions from Lucidtechs world class OCR-models
* feedback-to-model: Make sure the OCR-models stays state-of-the-art by feeding corrected results back to the model
* export-to-semine: One of many standard integration modules.
(TTNote: Should these be images or links to images?)
* make-predictions: Get predictions from Lucidtech's world-class OCR models
* feedback-to-model: Make sure the OCR models stay state-of-the-art by feeding corrected results back to the model
* export-to-semine: One of many standard integration modules

## For developers
When updating an image we use the following repo and naming convention:

## Naming convention
(TTNote: If all of the documentation is for developers, suggestion to relabel this header.)
When updating an image, we use the following repository and naming convention:
```
name=lucidtechai/transition-samples:<folder-name>
docker build . -t $name && docker push $name
```
```
2 changes: 2 additions & 0 deletions getting-started/dev/README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
# Quickstart

(TTNote: Suggestion to add some verbiage here to note these are quick start guides depending on the API method used and that more details can be found under the Introduction sections.)

Loading