@@ -4,57 +4,79 @@ Introduction
44
55.. _data_analytics_pipeline :
66
7- |dal_full_name | (|dal_short_name |) is a library that provides building blocks covering all stages of data analytics: data acquisition
8- from a data source, preprocessing, transformation, data mining, modeling, validation, and decision making.
7+ |dal_full_name | (|dal_short_name |) is a library that provides building
8+ blocks covering all stages of data analytics: data acquisition from a
9+ data source, preprocessing, transformation, data mining, modeling,
10+ validation, and decision making.
911
1012.. image :: _static/data_analytics_stages.png
1113 :width: 800
1214 :alt: Data analytis stages
1315
14- |dal_short_name | supports the concept of the end-to-end analytics when some of data analytics stages are performed on the
15- edge devices (close to where the data is generated and where it is finally consumed). Specifically,
16- |dal_short_name | Application Programming Interfaces (APIs) are agnostic about a particular cross-device
17- communication technology and, therefore, can be used within different end-to-end analytics frameworks.
16+ |dal_short_name | supports the concept of the end-to-end analytics when
17+ some of data analytics stages are performed on the edge devices (close
18+ to where the data is generated and where it is finally
19+ consumed). Specifically, |dal_short_name | Application Programming
20+ Interfaces (APIs) are agnostic about a particular cross-device
21+ communication technology and, therefore, can be used within different
22+ end-to-end analytics frameworks.
1823
1924.. image :: _static/e2eframeworks.png
2025 :width: 800
2126 :alt: End to End Analytics Frameworks
2227
2328|dal_short_name | consists of the following major components:
2429
25- - The :ref: `Data Management <data_management >` component includes classes and utilities for data acquisition, initial preprocessing and normalization,
26- for data conversion into numeric formats (performed by one of supported Data Sources), and for model representation.
30+ - The :ref: `Data Management <data_management >` component includes
31+ classes and utilities for data acquisition, initial preprocessing
32+ and normalization, for data conversion into numeric formats
33+ (performed by one of supported Data Sources), and for model
34+ representation.
2735
28- - The :ref: `Algorithms <algorithms >` component consists of classes that implement algorithms for data analysis (data mining) and data modeling
29- (training and prediction). These algorithms include clustering, classification, regression, and recommendation algorithms.
30- Algorithms support the following computation modes:
36+ - The :ref: `Algorithms <algorithms >` component consists of classes
37+ that implement algorithms for data analysis (data mining) and data
38+ modeling (training and prediction). These algorithms include
39+ clustering, classification, regression, and recommendation
40+ algorithms. Algorithms support the following computation modes:
3141
32- - :ref: `Batch processing <Batch >`: algorithms work with the entire data set to produce the final result
42+ - :ref: `Batch processing <Batch >`: algorithms work with the entire
43+ data set to produce the final result
3344
34- - :ref: `Online processing <Online >`: algorithms process a data set in blocks streamed into the device’s memory
45+ - :ref: `Online processing <Online >`: algorithms process a data set
46+ in blocks streamed into the device’s memory
3547
36- - :ref: `Distributed processing <Distributed >`: algorithms operate on a data set distributed across several devices
37- (compute nodes)
48+ - :ref: `Distributed processing <Distributed >`: algorithms operate
49+ on a data set distributed across several devices (compute nodes)
3850
39- Distributed algorithms in |dal_short_name | are abstracted from underlying cross-device communication technology,
40- which enables use of the library in a variety of multi-device computing and data transfer scenarios.
51+ Distributed algorithms in |dal_short_name | are abstracted from
52+ underlying cross-device communication technology, which enables
53+ use of the library in a variety of multi-device computing and
54+ data transfer scenarios.
4155
42- Depending on the usage, algorithms operate both on actual data (data set) and data models:
56+ Depending on the usage, algorithms operate both on actual data
57+ (data set) and data models:
4358
4459 - Analysis algorithms typically operate on data sets.
4560
46- - Training algorithms typically operate on a data set to train an appropriate data model.
61+ - Training algorithms typically operate on a data set to train an
62+ appropriate data model.
4763
48- - Prediction algorithms typically work with the trained data model and with a working data set.
64+ - Prediction algorithms typically work with the trained data model
65+ and with a working data set.
4966
50- - The **Utilities ** component includes auxiliary functionality intended to be used for design of
51- classes and implementation of methods such as memory allocators or type traits.
67+ - The **Utilities ** component includes auxiliary functionality
68+ intended to be used for design of classes and implementation of
69+ methods such as memory allocators or type traits.
5270
53- - The **Miscellaneous ** component includes functionality intended to be used by |dal_short_name |
54- algorithms and applications for algorithm customization and optimization on various stages of the
55- analytical pipeline. Examples of such algorithms include solvers and random number generators.
71+ - The **Miscellaneous ** component includes functionality intended to
72+ be used by |dal_short_name | algorithms and applications for
73+ algorithm customization and optimization on various stages of the
74+ analytical pipeline. Examples of such algorithms include solvers
75+ and random number generators.
5676
57- Classes in Data Management, Algorithms, Utilities, and Miscellaneous components cover the most
58- important usage scenarios and allow seamless implementation of complex data analytics workflows
59- through direct API calls. At the same time, the library is an object-oriented framework that helps
60- customize the API by redefining particular classes and methods of the library.
77+ Classes in Data Management, Algorithms, Utilities, and Miscellaneous
78+ components cover the most important usage scenarios and allow seamless
79+ implementation of complex data analytics workflows through direct API
80+ calls. At the same time, the library is an object-oriented framework
81+ that helps customize the API by redefining particular classes and
82+ methods of the library.
0 commit comments