Skip to content

Commit d07d622

Browse files
committed
ADDED: file types in docs
1 parent 7a995d6 commit d07d622

File tree

3 files changed

+219
-3
lines changed

3 files changed

+219
-3
lines changed

_posts/CORA.RDR/User-Guide/2025-10-18-Upload-Dataset.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,10 +49,12 @@ To add a dataset you need to introduce two types of information:
4949
* **The metadata fields**: The data that describes your dataset, its author, affiliation, related web page... You need
5050
to fill in *as many as possible* metadata fields with the information of the dataset. The explanation for each
5151
field can be found in the article [Dataset Fields](https://iciq-dmp.github.io/CORA.RDR/User-Guide/dataset-fields).
52-
* **The files**: The files that you want to add to this dataset. There are no limitations regarding format, but there
52+
* **The files**: The files that you want to add to this dataset. Drag and drop your desired files or use the button
53+
"Select Files to Add" to upload your files. There are no limitations regarding format, but there
5354
are some recommendations that you should follow. There is also a soft limit of 500 GB in a single dataset, but it can
5455
be bypassed by requesting it to the admins of CORA.RDR. The article
55-
[Dataset Files](https://iciq-dmp.github.io/CORA.RDR/User-Guide/dataset-files)
56+
[Dataset Files](https://iciq-dmp.github.io/CORA.RDR/User-Guide/dataset-files) explains all recommendation and
57+
guidelines regarding the upload of files to a dataset.
5658

5759
After you have introduced all the possible fields and your files in the dataset, you need to go to the bottom of the
5860
page, where you will see a part of the page similar to the following, where you need to check the checkbox "I have read

_posts/CORA.RDR/User-Guide/2025-10-20-Dataset-Files.md

Lines changed: 154 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,163 @@ parent: User Guide
77
grand_parent: CORA.RDR
88
---
99

10-
## Explanation for each dataset files
10+
# File structure in CORA.RDR / Dataverse
1111

12+
The main goal of CORA.RDR is to store and preserve digital data.
1213

14+
Digital data related is stored in **Datasets**. You can understand each dataset in CORA.RDR as a computer folder that
15+
can contain files or other folders, just as any
16+
folder of your
17+
computer.
1318

19+
Digital data are stored in software-specific file formats. The choice of software and format depends on the intended
20+
use. For example:
1421

22+
* Spreadsheets support formulas, sorting, and filtering — features not preserved in word processors.
23+
* Word processors support complex formatting such as page numbers and tables of contents.
1524

25+
However, saving a file in a program’s default format does **not** guarantee long-term usability. Risks include:
26+
27+
* Dependency on specific software or software versions
28+
* Obsolescence of proprietary formats
29+
* Exclusive or expensive software required to access files
30+
* Loss of significant characteristics if the format does not support them
31+
32+
To minimize these risks and improve long-term accessibility, use file formats with a high likelihood of remaining usable for many years.
33+
34+
Here are some guidelines on which formats are accepted and the recommended folder structure for a dataset.
35+
We encourage that you enforce these guidelines on the datasets that you upload to CORA.RDR, but it is not a
36+
technical limitation, which means that in some case the guidelines can be bypassed with a justified reason.
37+
38+
For example, we do recommend using CSV for tabular data, but usually users use Excel files to store their tabular data.
39+
It may be possible that when transforming this Excel file into a CSV file,
40+
the original Excel loses an important part of format of the Excel, affecting to the quality of the data. In this case
41+
and others it may be reasonable to bypass the guidelines on the recommended file formats.
42+
43+
44+
## Preferred and Non-Preferred File Formats (DANS)
45+
46+
Preferred formats are file formats that DANS – based on international agreements – is confident offer the best long-term
47+
guarantees in terms of usability, accessibility, and sustainability. Deposits of research data in preferred formats
48+
will **always** be accepted by DANS.
49+
50+
Non-preferred formats are widely used in addition to preferred formats and are expected to remain moderately to
51+
reasonably usable, accessible, and robust in the long term.
52+
53+
54+
## General Guidelines
55+
56+
DANS believes that file formats best suited for long-term sustainability and accessibility:
57+
58+
* Are frequently used
59+
* Have open specifications
60+
* Are independent of specific software, developers, or vendors
61+
62+
In practice, it is not always possible to use formats that satisfy all criteria.
63+
64+
It may be desirable to deposit certain original data in **non-preferred** formats because these are common usage formats
65+
(e.g., Esri Shapefiles, Microsoft Access databases, SPSS `.sav` files).
66+
In those cases, DANS requests you to deposit:
67+
68+
* The original format **and**
69+
* A preferred format for long-term sustainability.
70+
71+
72+
### File Format Overview
73+
74+
This information is based upon [this article](https://dans.knaw.nl/en/file-formats/) and has suffered some
75+
modifications.
76+
77+
#### Text Documents
78+
79+
| Type | Preferred Formats | Non-Preferred Formats |
80+
|---------------------------|-------------------------------------------------------------------------------------------------|-----------------------------|
81+
| **Text documents** | PDF/A (.pdf), ODT (.odt), Microsoft Word (.doc), Office Open XML (.docx), Rich Text File (.rtf) | PDF other than PDF/A (.pdf) |
82+
| **Plain text** | Unicode text (.txt) | Non-Unicode text (.txt) |
83+
| **Markup languages** | XML (.xml), HTML (.html) + related (.css, .xslt, .js, .es), Markdown (.md) ||
84+
| **Programming languages** | MATLAB, NetCDF, Text-Fabric, Python ||
85+
86+
#### Spreadsheets
87+
88+
| Preferred | Non-Preferred |
89+
|------------------------|---------------------------------------------------------------|
90+
| ODS (.ods), CSV (.csv) | Microsoft Excel (.xls), Office Open XML (.xlsx), PDF/A (.pdf) |
91+
92+
#### Databases
93+
94+
| Preferred | Non-Preferred |
95+
|----------------------------------------|------------------------------------------------------------------------|
96+
| SQL (.sql), SIARD (.siard), CSV (.csv) | Microsoft Access (.mdb, .accdb), dBase (.dbf), HDF5 (.hdf5, .he5, .h5) |
97+
98+
#### Statistical Data
99+
100+
| Preferred | Non-Preferred |
101+
|----------------------------------------------------------|----------------------------------------------------------------------------------------|
102+
| SPSS (.dat/.sps), STATA (.dat/.do), JASP (.csv/.html), R | SPSS Portable (.por), SPSS (.sav), STATA (.dta), SAS (.7dat, .sd2, .tpt), JASP (.jasp) |
103+
104+
#### Raster Images
105+
106+
| Preferred | Non-Preferred |
107+
|------------------------------------------------------------------------------------|---------------|
108+
| JPEG (.jpg, .jpeg), TIFF (.tif, .tiff), PNG (.png), JPEG 2000 (.jp2), DICOM (.dcm) ||
109+
110+
#### Vector Images
111+
112+
| Preferred | Non-Preferred |
113+
|------------|-----------------------------------------------------------------------|
114+
| SVG (.svg) | Adobe Illustrator (.ai), EPS (.eps), WMF/EMF (.wmf, .emf), CDR (.cdr) |
115+
116+
#### Audio
117+
118+
| Preferred | Non-Preferred |
119+
|--------------------------------------------------------------------------|--------------------------------------------------------------|
120+
| BWF (.bwf), MXF (.mxf), Matroska (.mka), FLAC (.flac), OPUS, WAVE (.wav) | MP3 (.mp3), AAC (.aac, .m4a), AIFF (.aif, .aiff), OGG (.ogg) |
121+
122+
#### Video
123+
124+
| Preferred | Non-Preferred |
125+
|-------------------------------------------------------------------------------------------------|-----------------------------------|
126+
| MXF (.mxf), Matroska (.mkv), MPEG-4 (.mp4, .m4a, .m4v, …), MPEG-2 (.mpg, .mpeg, .m2v, .mpg2, …) | AVI (.avi), QuickTime (.mov, .qt) |
127+
128+
#### CAD (Computer-Aided Design)
129+
130+
| Preferred | Non-Preferred |
131+
|--------------------------------------------|------------------------------------------------------|
132+
| AutoCAD DXF R12 (ASCII) (.dxf), SVG (.svg) | AutoCAD DXF (other versions), DWG (.dwg), DGN (.dgn) |
133+
134+
#### GIS (Geographical Information Systems)
135+
136+
| Preferred | Non-Preferred |
137+
|----------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
138+
| GML (.gml), MIF/MID (.mif/.mid), GeoJSON (.json), GeoPackage (.gpgk) | Esri Shapefiles (.shp + related), MapInfo (.tab + related), KML (.kml, .kmz), Esri Geodatabase (.gdb), Project files (.mxd, .wor, .qgs) |
139+
140+
#### Georeferenced Images
141+
142+
| Preferred | Non-Preferred |
143+
|-------------------------------------------------------------------------------------|----------------------|
144+
| GeoTIFF (.tif, .tiff), TIFF World File (.tfw + .tif), JPEG World File (.jgw + .jpg) | ERDAS IMAGINE (.img) |
145+
146+
#### Raster GIS
147+
148+
| Preferred | Non-Preferred |
149+
|-------------------------|------------------------------------------------------------------|
150+
| ASCII GRID (.asc, .txt) | Esri GRID (.grd), Surfer Grid (.grd, .srf), ERDAS IMAGINE (.img) |
151+
152+
#### 3D
153+
154+
| Preferred | Non-Preferred |
155+
|----------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
156+
| OBJ (.obj), PLY (.ply), X3D (.x3d), glTF 2.0 (.gltf, .glb), COLLADA (.dae), LAS (.las, .laz), IFC (.ifc) | Autodesk FBX (.fbx), Blender (.blend), glTF 1.0, 3D PDF (.pdf), Google Draco (.drc), Artec (.a3d), Agisoft Metashape (.psx, .psz), STL (.stl), VRML (.wrl, .wrz, .vrml) |
157+
158+
#### RDF
159+
160+
| Preferred | Non-Preferred |
161+
|---------------------------------------------------------------|---------------|
162+
| RDF/XML, Trig (.trig), Turtle (.ttl), NTriples (.nt), JSON-LD ||
163+
164+
#### CAQDAS (Qualitative Data)
165+
166+
| Preferred | Non-Preferred |
167+
|-----------|------------------------------------------|
168+
| REFI-QDA | ATLAS.TI copy bundle, NVivo project file |
16169

_posts/iMarina/Developer-Guide/2025-07-03-Developer-Guide.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,3 +228,64 @@ More info in [Programming standard](https://iciq-dmp.github.io/iMarina/Developer
228228

229229

230230

231+
232+
233+
234+
235+
236+
237+
Additions from the readme of iMarina load
238+
---
239+
240+
241+
#### Initialize `PYTHONPATH`
242+
This initialization adds the project path to Python's module search path, so that it can find our tests.
243+
244+
##### With `pytest.ini`
245+
To use the library correctly and find our tests, we must create the following file and place it in the same root
246+
directory as `Desktop/iMarina-load`. If it is already present
247+
248+
Create the `pytest.ini` file with the following content if not present:
249+
250+
``` ini
251+
[pytest]
252+
pythonpath = .
253+
```
254+
255+
Pytest uses the `pytest.ini` file to define global settings.
256+
The option `pythonpath = .`
257+
tells pytest to add the project root folder (.) to `PYTHONPATH`.
258+
259+
The test file must be in the project root (same folder as `src/` and `tests/`).
260+
261+
262+
##### With environment variable
263+
To ensure that Python sees the project root, you can manually add it to `PYTHONPATH` running this command:
264+
265+
```shell
266+
export PYTHONPATH=$PYTHONPATH:/home/your_usersystem/Desktop/iMarina-load
267+
```
268+
269+
---
270+
271+
272+
### Creating tests
273+
Test functions are written that begin with `test_` in files with the same
274+
name.
275+
276+
#### Test Folder
277+
Folder test for example `test_main.py`.
278+
279+
All the tests that we implement must be stored in this folder called `tests`.
280+
281+
#### Folder Location
282+
And the tests folder must be located in the **root directory** of your project:
283+
`Marina-load/tests.`
284+
285+
In the file called `test_main.py`, you must first import the classes needed for the test
286+
For example:
287+
288+
```python
289+
from src.main import Researcher, is_visitor
290+
```
291+

0 commit comments

Comments
 (0)