You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are writing up the documentation, please wait in patient.
10
9
@@ -13,39 +12,37 @@ Why TensorDB
13
12
----------------
14
13
15
14
TensorLayer is designed for production, aiming to be applied large scale machine learning application.
16
-
TensorDB introduce the database infracture to address the many challenges in large scale machine learning project, such as
15
+
TensorDB introduce the database infracture to address the many challenges in large scale machine learning project, such as:
17
16
18
-
1. how to mangage the training data and load the training datasets
17
+
1. How to mangage the training data and load the training datasets
19
18
2. When the dataset is so large that beyonds the storage limitation of one computer
20
-
3. how shoud we managment different models and version, and comparing different models.
19
+
3. How shoud we managment different models and version, and comparing different models.
21
20
4. How to automate the whole training, evaluaiton and deploy machine learning model automatically.
22
21
23
-
In the TensorLayer system, we introduce the database technology to the issues above.
22
+
In TensorLayer system, we introduce the database technology to the issues above.
24
23
25
24
TensorDB is designed by following three principles.
26
25
27
26
Everything is Data
28
27
^^^^^^^^^^^^^^^^^^
29
28
30
-
TensorDB is a data warehouse that stores that capture the whole machine learning development process. the data inside tensordb can be catagloried as
31
-
32
-
1. Data and Labels. Which includes all the data for training, validation and prediction. The labels can be manually labelled or generated by machine
33
-
2. Model Architecture.This group store the different model architecture, which user can select to use
34
-
3. Model Parameters.This tables stores all the model parameters of echo in the training step.
35
-
4. Jobs.all the computation is cutted into several jobs. Each jobs constains some computing work load. for training , the jobs includes training data , the model parameter, the model architecture, how many epochs the training want to do. Similarity are the validation jobs and inference jobs.
36
-
5. Logs. The logs store all the step time and accuracy and other metric of each training steps and also the time stamps.
29
+
TensorDB is a data warehouse that stores that capture the whole machine learning development process. the data inside tensordb can be catagloried as:
37
30
38
-
TensorDB is in principal is a key-word based search engine. each model, parameters, or training data are assigned many tags.
39
-
The data are stored in two layers. On the top, there is the index layer, which instore the blob storage reference with all the tags assigned to the data. which is implemented based on NoSQL document database such as mongodb. The second layer is used store big chunk of data, such as videos, medical images or image mask, which is usually implemented as file system. Our open source implementation is implemented based MongoDB. The blob data is in store in the gridfs while the tag index is stored in the documents.
31
+
1. Data and Labels: Which includes all the data for training, validation and prediction. The labels can be manually labelled or generated by machine
32
+
2. Model Architecture: This group store the different model architecture, which user can select to use
33
+
3. Model Parameters: This tables stores all the model parameters of echo in the training step.
34
+
4. Jobs: All the computation is cutted into several jobs. Each jobs constains some computing work load. for training , the jobs includes training data , the model parameter, the model architecture, how many epochs the training want to do. Similarity are the validation jobs and inference jobs.
35
+
5. Logs: The logs store all the step time and accuracy and other metric of each training steps and also the time stamps.
40
36
37
+
TensorDB in principal is a key-word based search engine. Each model, parameters, or training data are assigned many tags.
38
+
The data are stored in two layers. On the top, there is the index layer, which instore the blob storage reference with all the tags assigned to the data, which is implemented based on NoSQL document database such as Mongodb. The second layer is used store big chunk of data, such as videos, medical images or image mask, which is usually implemented as file system. Our open source implementation is implemented based MongoDB. The blob data is in store in the gridfs while the tag index is stored in the documents.
41
39
42
40
43
41
Everying is identified by Query
44
42
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
45
43
46
-
47
-
Within TensorDB framework, any entity within the data warehouse, such as the data, model or jobs are specified by the database query language.
48
-
The first advantage is the query is more efficient in space and can specify multiple objects in a concise way.
44
+
Within TensorDB framework, any entity within the data warehouse, such as the data, model or jobs are specified by the database query language.
45
+
The first advantage is the query is more efficient in space and can specify multiple objects in a concise way.
49
46
The advantage such a design is to enable a highly flexible software system.
50
47
data, model architecture and training are interchangeable.
51
48
Many work can be implemented by simply rewire different components.
@@ -61,10 +58,10 @@ Also the training system have no clue of epochs, instead, it knows batchize and
61
58
Many techniques are introduced behind the streaming interface.
62
59
The stream is implemented based on the database cursor technology, so for every search, only the cursors are returned, not the actual data.
63
60
Only when the generator is evaluated, the acutal data is loaded.
64
-
The data loading is further optimise
61
+
The data loading is further optimise:
65
62
66
-
1. data are compressed and decompressed,
67
-
2. the dataloaded in bulk model to optimise the IO traffic
63
+
1. Data are compressed and decompressed,
64
+
2. The dataloaded in bulk model to optimise the IO traffic
68
65
3. The argumentation or random sample are computed on the fly after the data are loaded into the local computer.
69
66
4. To optimise the space, the will also be a cache system that only store the recent blob data.
70
67
@@ -82,8 +79,7 @@ The exisitng implementation is based on Mongodb.
82
79
Further implementaiton on other database will be released depends on progress.
83
80
It will be stragihtford to port the tensorDB system to google cloud , aws and azure.
84
81
85
-
86
-
The following tutorial is based on the MongoDb implmenetation.
82
+
The following tutorials are based on the MongoDb implmenetation.
87
83
88
84
89
85
Install MongoDB
@@ -93,13 +89,11 @@ The installation instruction of Mongodb can be found at
0 commit comments