@@ -255,18 +255,58 @@ Good things
255255- Supported by many tool out of the box
256256- Easily shared
257257
258- Bas things
258+ Bad things
259259++++++++++
260260
261261- Can be slow to read and write
262262- high potential to increase required disk space substantially (e.g. when storing floating point numbers as text)
263263- Prone to loosing precision when storing floating point numbers
264264- Muli-dimensional data can be hard to represent
265- - While the data format might be specified, the data structure might not be clear when startig to read the data.
265+ - While the data format might be specified, the data structure might not be clear when starting to read the data.
266266
267267Further considerations
268268~~~~~~~~~~~~~~~~~~~~~~
269269
270+ - The closer your stored data is to the code, the more likely it depends on the environment you are working in.
271+ If you e.g. `pickle ` a generated model, you can only be sure, that the model will work as intended, if you
272+ load it in an environment, that has the same versions of all libraries the model depends on.
273+
274+
275+ Exercise
276+ --------
277+
278+ .. challenge ::
279+
280+ You have a model that you have been training for a while.
281+ Lets assume it's a relatively simple neural network (consisting of a network structure and it's associated weights).
282+
283+ Let's consider 2 scenarios
284+
285+ A: You have a different project, that is supposed to take this model, and do some processing with it to determine
286+ it's efficiency after different times of training.
287+
288+ B: You want to publish the model and make it available to others.
289+
290+ What are good options to store the model in each of these scenarios?
291+
292+ .. solution ::
293+
294+ A: Some export into a binary format that can be easily read. E.g. pickle or a specific export function from the libbrary you use.
295+ It also depends, on whether you intend to make the intermediary steps available to others.
296+ If you do, you might also want to consider storing structure and weights separately or use a format specific for the
297+ type of model you are training, to keep the data independent of the library.
298+
299+ B: You might want to consider a more general format, that is supported by many libraries, e.g. ONNX, or a format that is
300+ specifically designed for the type of model you are training.
301+ You might also want to consider additionally storing the model in a way that is easily readable by humans, to make it easier for others
302+ to understand the model.
303+
304+
305+ Convert untidy data into tidy data with Pandas
306+ ----------------------------------------------
307+
308+
309+
270310
271311
272312Things to remember
0 commit comments