Skip to content

Commit e31ea75

Browse files
committed
WIP
1 parent 38b2f70 commit e31ea75

File tree

1 file changed

+42
-2
lines changed

1 file changed

+42
-2
lines changed

content/work-with-data.rst

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -255,18 +255,58 @@ Good things
255255
- Supported by many tool out of the box
256256
- Easily shared
257257

258-
Bas things
258+
Bad things
259259
++++++++++
260260

261261
- Can be slow to read and write
262262
- high potential to increase required disk space substantially (e.g. when storing floating point numbers as text)
263263
- Prone to loosing precision when storing floating point numbers
264264
- Muli-dimensional data can be hard to represent
265-
- While the data format might be specified, the data structure might not be clear when startig to read the data.
265+
- While the data format might be specified, the data structure might not be clear when starting to read the data.
266266

267267
Further considerations
268268
~~~~~~~~~~~~~~~~~~~~~~
269269

270+
- The closer your stored data is to the code, the more likely it depends on the environment you are working in.
271+
If you e.g. `pickle` a generated model, you can only be sure, that the model will work as intended, if you
272+
load it in an environment, that has the same versions of all libraries the model depends on.
273+
274+
275+
Exercise
276+
--------
277+
278+
.. challenge::
279+
280+
You have a model that you have been training for a while.
281+
Lets assume it's a relatively simple neural network (consisting of a network structure and it's associated weights).
282+
283+
Let's consider 2 scenarios
284+
285+
A: You have a different project, that is supposed to take this model, and do some processing with it to determine
286+
it's efficiency after different times of training.
287+
288+
B: You want to publish the model and make it available to others.
289+
290+
What are good options to store the model in each of these scenarios?
291+
292+
.. solution::
293+
294+
A: Some export into a binary format that can be easily read. E.g. pickle or a specific export function from the libbrary you use.
295+
It also depends, on whether you intend to make the intermediary steps available to others.
296+
If you do, you might also want to consider storing structure and weights separately or use a format specific for the
297+
type of model you are training, to keep the data independent of the library.
298+
299+
B: You might want to consider a more general format, that is supported by many libraries, e.g. ONNX, or a format that is
300+
specifically designed for the type of model you are training.
301+
You might also want to consider additionally storing the model in a way that is easily readable by humans, to make it easier for others
302+
to understand the model.
303+
304+
305+
Convert untidy data into tidy data with Pandas
306+
----------------------------------------------
307+
308+
309+
270310

271311

272312
Things to remember

0 commit comments

Comments
 (0)