Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .tool-versions
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
java temurin-11.0.21+9
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: DX may ask to revert this however I personally appreciate it

sbt 1.9.7
python 3.11.6
59 changes: 59 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -309,9 +309,68 @@ $ dx run bam_chrom_counter -istage-common.bam=project-BQbJpBj0bvygyQxgQ1800Jkk:f
* `SoftwareRequirement` and `InplaceUpdateRequirement` are not yet supported
* Publishing a dxCompiler-generated workflow as a global workflow is not supported

## Authenticated HTTP Imports

dxCompiler supports importing WDL files from private HTTP sources that require authentication, such as private GitHub repositories.

### Configuration

Set the `WDL_IMPORT_TOKEN` environment variable with your access token:

```bash
# For GitHub, use a Personal Access Token (PAT)
export WDL_IMPORT_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxx"

# Then run dxCompiler as usual
java -jar dxCompiler.jar compile workflow.wdl -project project-xxxx -folder /my/workflows/
```

### Supported Domains

By default, the token is only sent to these domains (for security):
- `github.com`
- `raw.githubusercontent.com`

To add additional domains, use the `WDL_IMPORT_TOKEN_DOMAINS` environment variable:

```bash
# Add custom domains (comma-separated)
export WDL_IMPORT_TOKEN_DOMAINS="github.com,raw.githubusercontent.com,gitlab.com,my-private-server.com"
```

### Example Usage

In your WDL file, import from a private repository:

```wdl
version 1.0

import "https://raw.githubusercontent.com/myorg/private-repo/main/tasks/my_task.wdl" as private_tasks

workflow my_workflow {
call private_tasks.my_task
}
```

### Getting a GitHub Token

1. Go to https://github.com/settings/tokens
2. Click "Generate new token (classic)"
3. Select the `repo` scope for private repository access
4. Copy the generated token and set it as `WDL_IMPORT_TOKEN`

### Security Notes

- The token is only sent to explicitly allowed domains
- The token is never logged
- If the token is not set, imports work as before (for public URLs only)

For more details, see [Authenticated Imports documentation](doc/AUTHENTICATED_IMPORTS.md).

## Additional information

- [Advanced options](doc/ExpertOptions.md) explains additional compiler options
- [Authenticated Imports](doc/AUTHENTICATED_IMPORTS.md) how to import WDL from private GitHub repositories
- [Internals](doc/Internals.md) describes current compiler structure (_work in progress_)
- [Tips](doc/Tips.md) examples for how to write good WDL code
- [Debugging](doc/Debugging.md) recommendations how to debug the workflows on DNAnexus platform
Expand Down
4 changes: 4 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## Unreleased

### New Features

* **Authenticated HTTP Imports**: Added support for importing WDL files from private HTTP sources that require authentication (e.g., private GitHub repositories). Set the `WDL_IMPORT_TOKEN` environment variable with a Bearer token to enable authenticated imports. By default, tokens are only sent to `github.com` and `raw.githubusercontent.com` domains. Additional domains can be configured via `WDL_IMPORT_TOKEN_DOMAINS`. See [Authenticated Imports documentation](doc/AUTHENTICATED_IMPORTS.md) for details.

## 2.15.0 2025-09-29

* Added support for new region in OCI Ashburn
Expand Down
28 changes: 22 additions & 6 deletions compiler/src/main/scala/dxCompiler/Main.scala
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ import dx.core.languages.wdl.WdlOptions
import dx.dxni.DxNativeInterface
import dx.translator.{Extras, TranslatorFactory}
import dx.util.protocols.DxFileAccessProtocol
import dx.util.{Enum, FileSourceResolver, FileUtils, Logger, TraceLevel}
import dx.util.{Enum, FileAccessProtocol, FileSourceResolver, FileUtils, LocalFileAccessProtocol, Logger, TraceLevel}
import dx.core.io.AuthenticatedHttpFileAccessProtocol
import spray.json.{JsNull, JsValue}
import wdlTools.types.TypeCheckingRegime

Expand Down Expand Up @@ -67,17 +68,25 @@ object Main {
* - creates a FileSourceResolver that looks for local files in any configured -imports
* directories and has a DxFileAccessProtocol
* - initializes a Logger
* - configures authenticated HTTP imports if WDL_IMPORT_TOKEN is set
* @param options parsed options
* @return (FileSourceResolver, Logger)
*/
private def initCommon(options: Options): (FileSourceResolver, Logger) = {
val logger = initLogger(options)
val imports: Vector[Path] = options.getList[Path]("imports")
val fileResolver = FileSourceResolver.create(
imports,
Vector(DxFileAccessProtocol()),
logger

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Create authenticated HTTP protocol for importing from private repositories
val httpProtocol = AuthenticatedHttpFileAccessProtocol.fromEnvironment(logger)

// Build protocol list - order matters, first matching protocol wins
val protocols: Vector[FileAccessProtocol] = Vector(
LocalFileAccessProtocol(imports, logger),
httpProtocol,
DxFileAccessProtocol()
)

val fileResolver = FileSourceResolver(protocols)
FileSourceResolver.set(fileResolver)
(fileResolver, logger)
}
Expand Down Expand Up @@ -877,7 +886,8 @@ object Main {
| input values may only be specified for the top-level workflow.
| -leaveWorkflowsOpen Leave created workflows open (otherwise they are closed).
| -p | -imports <string> Directory to search for imported WDL or CWL files. May be specified
| multiple times.
| multiple times. For HTTP imports from private repositories,
| set the WDL_IMPORT_TOKEN environment variable (see below).
| -projectWideReuse Look for existing applets/workflows in the entire project
| before generating new ones. The default search scope is the
| target folder only.
Expand Down Expand Up @@ -926,6 +936,12 @@ object Main {
| -verboseKey <module> Print verbose output only for a specific module. May be
| specified multiple times.
| -logFile <path> File to use for logging output; defaults to stderr.
|
|Environment variables
| WDL_IMPORT_TOKEN Bearer token for authenticated HTTP imports (e.g., GitHub PAT
| for private repositories). Token is sent only to allowed domains.
| WDL_IMPORT_TOKEN_DOMAINS Comma-separated list of domains to send the token to.
| Defaults to: github.com,raw.githubusercontent.com
|""".stripMargin

def main(args: Vector[String]): Unit = {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
package dx.core.io

import dx.util.{FileAccessProtocol, FileUtils, Logger}
import java.net.URI
import java.nio.charset.Charset

/**
* HTTP file access protocol with Bearer token authentication support.
*
* Reads authentication token from WDL_IMPORT_TOKEN environment variable.
* Only sends tokens to allowed domains (configurable via WDL_IMPORT_TOKEN_DOMAINS)
* to prevent credential leakage to untrusted servers.
*
* @param token Optional Bearer token (defaults to WDL_IMPORT_TOKEN env var)
* @param allowedDomains Set of domains to send auth token to
* @param encoding Character encoding for file content
* @param logger Logger for trace/debug output
*/
case class AuthenticatedHttpFileAccessProtocol(
token: Option[String] = None,
allowedDomains: Set[String] = AuthenticatedHttpFileAccessProtocol.defaultAllowedDomains,
encoding: Charset = FileUtils.DefaultEncoding,
logger: Logger = Logger.Quiet
) extends FileAccessProtocol {

override val schemes: Vector[String] = Vector(FileUtils.HttpScheme, FileUtils.HttpsScheme)
override val supportsDirectories: Boolean = true

/**
* Determines if authentication should be used for the given URI.
* Only returns true if a token is configured AND the domain is in the allowed list.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking my understanding here, if the import belongs to one of the allowed domains AND a token is set in the env then this protocol will be used. Is this done on an import-by-import basis?

What happens if we have imports from multiple sources? e.g.

import local.wdl
import http://raw.githubusercontent.com/remote/public.wdl
import https://github.com/company/repo/private.wdl

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, it does it per source in the resolver (above). Likely worth just trying it just to make sure but it looks like that's what it's doing.

*/
private def shouldAuthenticate(uri: URI): Boolean = {
token.isDefined && Option(uri.getHost).exists(host =>
allowedDomains.exists(_.equalsIgnoreCase(host))
)
}

override def resolve(address: String): AuthenticatedHttpFileSource = {
val uri = URI.create(address)
val useAuth = shouldAuthenticate(uri)
if (useAuth) {
logger.trace(s"Using authenticated HTTP for import from: ${uri.getHost}")
}
AuthenticatedHttpFileSource(uri, encoding, isDirectory = false, if (useAuth) token else None)(address)
}

override def resolveDirectory(address: String): AuthenticatedHttpFileSource = {
val uri = URI.create(address)
val useAuth = shouldAuthenticate(uri)
if (useAuth) {
logger.trace(s"Using authenticated HTTP for directory import from: ${uri.getHost}")
}
AuthenticatedHttpFileSource(uri, encoding, isDirectory = true, if (useAuth) token else None)(address)
}
}

object AuthenticatedHttpFileAccessProtocol {

/** Environment variable name for the Bearer token */
val TokenEnvVar: String = "WDL_IMPORT_TOKEN"

/** Environment variable name for custom allowed domains */
val DomainsEnvVar: String = "WDL_IMPORT_TOKEN_DOMAINS"

/** Default allowed domains that will receive the auth token */
val defaultDomains: Set[String] = Set(
"github.com",
"raw.githubusercontent.com"
)

/**
* Gets the allowed domains from environment variable or defaults.
* WDL_IMPORT_TOKEN_DOMAINS should be a comma-separated list of domains.
*/
lazy val defaultAllowedDomains: Set[String] = {
sys.env.get(DomainsEnvVar) match {
case Some(domains) =>
domains.split(",").map(_.trim.toLowerCase).filter(_.nonEmpty).toSet
case None =>
defaultDomains
}
}

/**
* Creates an instance with configuration from environment variables.
*
* @param logger Logger for trace output (token values are never logged)
* @return AuthenticatedHttpFileAccessProtocol configured from environment
*/
def fromEnvironment(logger: Logger = Logger.Quiet): AuthenticatedHttpFileAccessProtocol = {
val tokenOpt = sys.env.get(TokenEnvVar)
if (tokenOpt.isDefined) {
logger.trace(
s"${TokenEnvVar} found; authenticated HTTP imports enabled for domains: ${defaultAllowedDomains.mkString(", ")}"
)
}
AuthenticatedHttpFileAccessProtocol(
token = tokenOpt,
allowedDomains = defaultAllowedDomains,
logger = logger
)
}
}
Loading