Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Code of Conduct

## Our Commitment

We are dedicated to providing a welcoming, inclusive, and respectful
environment for all contributors. Whether you are contributing code,
documentation, or feedback, we expect everyone to interact in a manner that
is constructive and respectful.

## Standards of Behavior

- **Be Respectful**: Treat everyone with kindness and respect.
- **Be Collaborative**: Value diverse perspectives and work together toward
solutions.
- **Be Professional**: Avoid personal attacks, harassment, or inappropriate
language.
- **Be Open**: Encourage questions and contributions, regardless of experience
level or background.

## Unacceptable Behavior

The following actions are not tolerated:
- Harassment or discrimination based on gender, race, sexual orientation,
religion, or other personal characteristics.
- Personal attacks, threats, or intimidation.
- Inappropriate or offensive language.

## Reporting Issues

If you experience or witness unacceptable behavior, please report it to the
repository maintainers at [[email protected]]. All reports will be handled
confidentially.

## Consequences

Participants who violate this Code of Conduct may be removed from the project
and banned from future contributions.

## Acknowledgements

This Code of Conduct is adapted from the
[Contributor Covenant](https://www.contributor-covenant.org/), version 2.1.
184 changes: 182 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,182 @@
# data-analysis-tools
Public data analysis tools for freva
# Freva Tool Configuration with `tool.toml`

Freva allows users to define data analysis tools in a structured and
reproducible way using `tool.toml` configuration files. These files provide
a standardized interface for defining tool metadata, parameters, dependencies,
and execution commands. Freva parses these files to automatically create a
user interface (via command-line or web UI) for applying the tools in a
reproducible manner.

## Define your tool via a `tool.toml` file

The `tool.toml` file simplifies the process of:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest removing the .toml extension to make it easier to understand and straightforward. We might find benefits in other config types like as 'yaml' or 'conf' in the future, as if we are currently thinking in drs_config.toml to switch yaml , we don't have to change the names of all tools configurations or worse, ask them to change in their tool. Also, I suggest modifying the name to something like .freva rather than tool. Yes, it's true that it is a config file of tool, but because, to be honest, the user usually says that freva is running my program and that I need to setup freva for it. so I think the configuration fits more if we call it .freva. Then on the home directory, we have ~/.freva/... or ~/.frevatool or something relevant to freva instead of ~/tool/... which might be confusing for the user what tool is in my home dir, since the name is not telling anything about freva.

1. **Tool Definition**:
- Specify metadata such as name, version, and description.
- Define input parameters, execution commands, and dependencies.
2. **Reproducibility**:
- Document dependencies and build processes for consistent execution.
- Capture all required information in a single, portable file.
3. **User Interface Integration**:
- Automatically generate a CLI or web interface for the tool based on
the `tool.toml` configuration.

### Writing the `tool.toml` File

The `tool.toml` file is structured into several sections:

#### **1. General Information**
The `[tool]` section provides metadata about the tool.

```toml
[tool]
name = "example-tool" # Unique name for the tool
version = "v1.0.0" # Semantic version
authors = ["Author 1 <[email protected]>", "Author 2 <[email protected]>"]
summary = "A brief description of what this tool does."
title = "A catchy title for the tool (optional)"
description = """
A detailed explanation of the tool's purpose, functionality, and usage.
"""
```

#### **2. Execution Settings**
The [tool.run] section defines how the tool is executed.

```toml
[tool.run]
command = "python script.py" # Command to run the tool
dependencies = ["python=3.10", "numpy"] # Conda-Forge dependencies
```

If the tool requires compilation or installation steps, include a build.sh script,
and define build-time dependencies in the `[tool.build]` section:

```toml
[tool.build]
dependencies = ["rust"] # Build-specific dependencies
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we may define rust or any other tool prerequisite in the dependents of tool.run, and since every tool has one env file for a tool.toml under ~/tool/.. dir, do you think we need these dependencies with this structure under build sub table?
I'd say let's do it another way, like Conda is doing. I meant dedicating greater resources to getting external items.

[tool.build]
url = "https://www.example.com/executor.tgz"
#or
local = "/somewhere/on/levante/executor.tgz"

And for more urls or localities, I see the value of using yaml instead of toml again because toml treats everything as a dictionary. But I'm not sure about that part, and I think it needs further brainstorming.

```

#### **3. Input Parameters**

The `[tool.input_parameters]` section specifies the parameters the tool accepts.

Each parameter is defined with:

- `title`: The name or description of the parameter.
- `type`: The expected data type (e.g., string, integer, float).
- `mandatory`: Whether the parameter is required.
- `default`: The default value (if optional).
- `help`: A detailed explanation of the parameter's purpose.


```toml
[tool.input_parameters.parameter_1]
title = "Input File"
type = "string"
mandatory = true
help = "The path to the input file for the analysis."

[tool.input_parameters.parameter_2]
title = "Verbose Mode"
type = "bool"
default = false
help = "Enable verbose output during execution."
```

#### **4. Advanced Features**

Freva supports advanced parameter types, such as databrowser integration or ranges, for more complex use cases.
Databrowser Parameters

These allow integration with a databrowser search interface:

##### Databrowser Parameters

These allow integration with a databrowser search interface:

```toml
[tool.input_parameters.parameter_db]
title = "Search Parameter"
type = "databrowser"
search_key = "variable"
Copy link

@mo-dkrz mo-dkrz Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts about extending this search_key to search_attributes or whatever is relevant and defining it like this? The user will then have more control over a single parameter in the databrowser.
Also, because this is a freva generic type, what do you think about changing the type name to freva rather than solr or databrowser? and also, because all of the other introduced parameters types are familiar to scientists who write script.

[tool.input_parameters.parameter_db]
title = "advanced custom"
type = "freva"
search_attrs = [
    { "variable" : "something" },
    { "variable" : "something_else" },
    { "variable_not" : "avoided var"},
    {"experiment_not":  "historical"},
]

I defined different dicts since we have variables twice and each key in a dictionary must be unique. Again another disadvantage of using toml! 
Then we don't have to decide on any pre_defined that are unnecessary. We can reference user or admin or whoever setup tool.toml to look at the docs.

default = "tas"
help = """
Integrates with the databrowser to search for data based on the specified key.
"""
```

##### Range Parameters
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do u think to move the range to the previous section?
Or if you want to keep it here, at least let's make it more advance:

[tool.input_parameters.parameter_range]
title = "Range Example"
type = "range"
value_type = "datetime" # float or int
default = ["2024-01-01T00:00:00", "2024-12-31T23:59:59", "1d"]

or we can simply take care of value_type in the backend


Define ranges for numerical inputs:

```toml
[tool.input_parameters.parameter_range]
title = "Range Example"
type = "range"
default = [0, 10, 1]
help = "Specify a numerical range in the format [start, end, increment]."
```

## Tool Execution Workflow

1. Define the tool.toml File:
- Create the tool.toml file with the required sections.
1. Parse with Freva:
- Freva parses the tool.toml file to generate a CLI or web interface.
1. Run the Tool:
- Users can apply the tool through the Freva interface, providing input parameters interactively or via scripts.
1. Reproducibility:
- Freva ensures all executions are logged with version and parameter details for reproducibility.


## How to Contribute

We encourage users to contribute their tools to this repository. Follow these
steps to add your tool:

1. Navigate to the repository page and click the **Fork** button.
1. Clone your fork to your local machine:
```console
git clone https://github.com/your-username/freva-tools.git
cd freva-tools
```
1. Create a new branch for your tool:
```console
git checkout -b add-your-tool-name
```
1. Create a new folder in the `tools/` directory with a descriptive name for your tool:
```console
mkdir tools/your-tool-name
```
1. Add your tool files to this folder:
- `tool.toml`: Defines your tool's metadata, parameters, and execution logic.
- `build.sh`: (if applicable): Handles build or installation steps.
- Source code files (e.g., Python scripts, shell scripts or other source).

1. Include a `LICENSE` file to specify how others can use your tool. For scientific tools,
consider using a license that encourages proper attribution or citation (e.g., BSD 3-Clause License).

1. Add and commit your changes:
```console
git add tools/your-tool-name
git commit -m "Add your-tool-name"
```

1. Push your branch to your fork:
```console
git push origin add-your-tool-name
```
1. Navigate to the original repository and open a pull request.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of adding some lines regarding create_environment script?

python create_environment.py --help
usage: create-conda-env [-h] [-d] [-f] [-p PREFIX] [-v] input_dir

positional arguments:
  input_dir            The path to the tool definition.

options:
  -h, --help           show this help message and exit
  -d, --dev            Use development mode for any installation.
  -f, --force          Force recreation of the environment.
  -p, --prefix PREFIX  The install prefix where the environment should be installed
  -v, --verbose


## Best Practices

- Semantic Versioning: Use clear versioning for tools (e.g., v1.0.0) to track changes.
- Dependencies: List all required dependencies explicitly in the tool.run.dependencies or tool.build.dependencies sections.
- Parameters: Provide descriptive help messages for all parameters to guide users.
- Reproducibility: Document any additional setup steps (e.g., build.sh) and ensure they are included in the tool's environment.

## Additional Resources

- [TOML Syntax Documentation](https://toml.io)
- [Conda forge](https://conda-forge.org/)
Loading