@@ -116,90 +116,90 @@ Table below describes some data formats:
116116 | storage/sharing:
117117
118118 * - :ref: `Pickle <pickle >`
119- - 🔴
119+ - ❌
120120 - 🟡
121- - 🟢
121+ - ✅
122122 - 🟡
123123 - 🟡
124- - 🔴
124+ - ❌
125125
126126 * - :ref: `CSV <csv >`
127- - 🟢
128- - 🔴
129- - 🔴
130- - 🟢
127+ - ✅
128+ - ❌
129+ - ❌
130+ - ✅
131131 - 🟡
132- - 🟢
132+ - ✅
133133
134134 * - :ref: `Feather <feather >`
135- - 🔴
136- - 🟢
137- - 🔴
138- - 🟢
139- - 🔴
140- - 🔴
135+ - ❌
136+ - ✅
137+ - ❌
138+ - ✅
139+ - ❌
140+ - ❌
141141
142142 * - :ref: `Parquet <parquet >`
143- - 🔴
144- - 🟢
143+ - ❌
144+ - ✅
145145 - 🟡
146- - 🟢
146+ - ✅
147147 - 🟡
148- - 🟢
148+ - ✅
149149
150150 * - :ref: `npy <npy >`
151- - 🔴
151+ - ❌
152152 - 🟡
153- - 🔴
154- - 🔴
155- - 🟢
156- - 🔴
153+ - ❌
154+ - ❌
155+ - ✅
156+ - ❌
157157
158158 * - :ref: `HDF5 <hdf5 >`
159- - 🔴
160- - 🟢
161- - 🔴
162- - 🔴
163- - 🟢
164- - 🟢
159+ - ❌
160+ - ✅
161+ - ❌
162+ - ❌
163+ - ✅
164+ - ✅
165165
166166 * - :ref: `NetCDF4 <netcdf4 >`
167- - 🔴
168- - 🟢
169- - 🔴
170- - 🔴
171- - 🟢
172- - 🟢
167+ - ❌
168+ - ✅
169+ - ❌
170+ - ❌
171+ - ✅
172+ - ✅
173173
174174 * - :ref: `JSON <json >`
175- - 🟢
176- - 🔴
175+ - ✅
176+ - ❌
177177 - 🟡
178- - 🔴
179- - 🔴
180- - 🟢
178+ - ❌
179+ - ❌
180+ - ✅
181181
182182 * - :ref: `Excel <excel >`
183- - 🔴
184- - 🔴
185- - 🔴
183+ - ❌
184+ - ❌
185+ - ❌
186186 - 🟡
187- - 🔴
188- - 🟢
187+ - ❌
188+ - ✅
189189
190190 * - :ref: `Graph formats <graph >`
191191 - 🟡
192192 - 🟡
193- - 🔴
194- - 🔴
195- - 🔴
193+ - ❌
194+ - ❌
195+ - ❌
196196 - 🟡
197197
198198.. important ::
199199
200- - 🟢 : Good
200+ - ✅ : Good
201201 - 🟡 : Ok / depends on a case
202- - 🔴 : Bad
202+ - ❌ : Bad
203203
204204
205205Storing arbitrary Python objects
@@ -216,10 +216,10 @@ Pickle
216216 - **Type **: Binary format
217217 - **Packages needed: ** None (:mod: `pickle `-module is included with Python).
218218 - **Space efficiency: ** 🟡
219- - **Arbitrary data: ** 🟢
219+ - **Arbitrary data: ** ✅
220220 - **Tidy data: ** 🟡
221221 - **Array data: ** 🟡
222- - **Long term archival/sharing: ** 🔴 ! See warning below.
222+ - **Long term archival/sharing: ** ❌ ! See warning below.
223223 - **Best use cases: ** Saving Python objects for debugging.
224224
225225.. warning ::
@@ -282,11 +282,11 @@ CSV (comma-separated values)
282282
283283 - **Type: ** Text format
284284 - **Packages needed: ** numpy, pandas
285- - **Space efficiency: ** 🔴
286- - **Arbitrary data: ** 🔴
287- - **Tidy data: ** 🟢
285+ - **Space efficiency: ** ❌
286+ - **Arbitrary data: ** ❌
287+ - **Tidy data: ** ✅
288288 - **Array data: ** 🟡
289- - **Long term archival/sharing: ** 🟢
289+ - **Long term archival/sharing: ** ✅
290290 - **Best use cases: ** Sharing data. Small data. Data that needs to be human-readable.
291291
292292CSV is by far the most popular file format, as it is human-readable and easily shareable.
@@ -367,11 +367,11 @@ Feather
367367
368368 - **Type: ** Binary format
369369 - **Packages needed: ** pandas, pyarrow
370- - **Space efficiency: ** 🟢
371- - **Arbitrary data: ** 🔴
372- - **Tidy data: ** 🟢
373- - **Array data: ** 🔴
374- - **Long term archival/sharing: ** 🔴
370+ - **Space efficiency: ** ✅
371+ - **Arbitrary data: ** ❌
372+ - **Tidy data: ** ✅
373+ - **Array data: ** ❌
374+ - **Long term archival/sharing: ** ❌
375375 - **Best use cases: ** Temporary storage of tidy data.
376376
377377`Feather <https://arrow.apache.org/docs/python/feather.html >`__ is a file format for storing data frames quickly.
@@ -408,11 +408,11 @@ Parquet
408408
409409 - **Type: ** Binary format
410410 - **Packages needed: ** pandas, pyarrow
411- - **Space efficiency: ** 🟢
411+ - **Space efficiency: ** ✅
412412 - **Arbitrary data: ** 🟡
413- - **Tidy data: ** 🟢
413+ - **Tidy data: ** ✅
414414 - **Array data: ** 🟡
415- - **Long term archival/sharing: ** 🟢
415+ - **Long term archival/sharing: ** ✅
416416 - **Best use cases: ** Working with big datasets in tidy data format. Archival of said data.
417417
418418`Parquet <https://arrow.apache.org/docs/python/parquet.html >`__ is a standardized open-source
@@ -495,10 +495,10 @@ npy (numpy array format)
495495 - **Type **: Binary format
496496 - **Packages needed: ** numpy
497497 - **Space efficiency: ** 🟡
498- - **Arbitrary data: ** 🟢
499- - **Tidy data: ** 🔴
500- - **Array data: ** 🟢
501- - **Long term archival/sharing: ** 🔴
498+ - **Arbitrary data: ** ✅
499+ - **Tidy data: ** ❌
500+ - **Array data: ** ✅
501+ - **Long term archival/sharing: ** ❌
502502 - **Best use cases: ** Saving numpy arrays temporarily.
503503
504504If you want to temporarily store numpy arrays, you can use the :func: `numpy.save `- and :func: `numpy.load `-functions::
@@ -532,11 +532,11 @@ HDF5 (Hierarchical Data Format version 5)
532532
533533 - **Type: ** Binary format
534534 - **Packages needed: ** numpy, pandas, PyTables, h5py
535- - **Space efficiency: ** 🟢
536- - **Arbitrary data: ** 🔴
537- - **Tidy data: ** 🔴
538- - **Array data: ** 🟢
539- - **Long term archival/sharing: ** 🟢
535+ - **Space efficiency: ** ✅
536+ - **Arbitrary data: ** ❌
537+ - **Tidy data: ** ❌
538+ - **Array data: ** ✅
539+ - **Long term archival/sharing: ** ✅
540540 - **Best use cases: ** Working with big datasets in array data format.
541541
542542HDF5 is a high performance storage format for storing large amounts of data in multiple datasets in a single file.
@@ -601,11 +601,11 @@ NetCDF4 (Network Common Data Form version 4)
601601
602602 - **Type **: Binary format
603603 - **Packages needed: ** pandas, netCDF4/h5netcdf, xarray
604- - **Space efficiency: ** 🟢
605- - **Arbitrary data: ** 🔴
606- - **Tidy data: ** 🔴
607- - **Array data: ** 🟢
608- - **Long term archival/sharing: ** 🟢
604+ - **Space efficiency: ** ✅
605+ - **Arbitrary data: ** ❌
606+ - **Tidy data: ** ❌
607+ - **Array data: ** ✅
608+ - **Long term archival/sharing: ** ✅
609609 - **Best use cases: ** Working with big datasets in array data format. Especially useful if the dataset contains spatial or temporal dimensions. Archiving or sharing those datasets.
610610
611611NetCDF4 is a data format that uses HDF5 as its file format, but it has standardized structure of datasets and metadata related to these datasets.
@@ -679,11 +679,11 @@ JSON (JavaScript Object Notation)
679679
680680 - **Type **: Text format
681681 - **Packages needed: ** None (:mod: `json `-module is included with Python).
682- - **Space efficiency: ** 🔴
682+ - **Space efficiency: ** ❌
683683 - **Arbitrary data: ** 🟡
684- - **Tidy data: ** 🔴
685- - **Array data: ** 🔴
686- - **Long term archival/sharing: ** 🟢
684+ - **Tidy data: ** ❌
685+ - **Array data: ** ❌
686+ - **Long term archival/sharing: ** ✅
687687 - **Best use cases: ** Saving nested/relational data, storing web requests.
688688
689689JSON is a popular human-readable data format.
@@ -712,11 +712,11 @@ Excel
712712
713713 - **Type **: Text format
714714 - **Packages needed: ** `openpyxl <https://openpyxl.readthedocs.io/en/stable/ >`__
715- - **Space efficiency: ** 🔴
716- - **Arbitrary data: ** 🔴
715+ - **Space efficiency: ** ❌
716+ - **Arbitrary data: ** ❌
717717 - **Tidy data: ** 🟡
718- - **Array data: ** 🔴
719- - **Long term archival/sharing: ** 🟢
718+ - **Array data: ** ❌
719+ - **Long term archival/sharing: ** ✅
720720 - **Best use cases: ** Sharing data in many fields. Quick data analysis.
721721
722722Excel is very popular in social sciences and economics.
@@ -735,9 +735,9 @@ Graph formats (adjency lists, gt, GraphML etc.)
735735 - **Type **: Many different formats
736736 - **Packages needed: ** Depends on a format.
737737 - **Space efficiency: ** 🟡
738- - **Arbitrary data: ** 🔴
739- - **Tidy data: ** 🔴
740- - **Array data: ** 🔴
738+ - **Arbitrary data: ** ❌
739+ - **Tidy data: ** ❌
740+ - **Array data: ** ❌
741741 - **Long term archival/sharing: ** 🟡
742742 - **Best use cases: ** Saving graphs or data that can be represented as a graph.
743743
0 commit comments