JuliaDecisionFocusedLearning
diff --git a/‎Project.toml‎
Lines changed: 2 additions & 0 deletions b/‎Project.toml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/src/warcraft.md‎
Lines changed: 155 additions & 0 deletions b/‎docs/src/warcraft.md‎
Lines changed: 155 additions & 0 deletions
diff --git a/‎src/DynamicVehicleScheduling/DynamicVehicleScheduling.jl‎
Lines changed: 28 additions & 11 deletions b/‎src/DynamicVehicleScheduling/DynamicVehicleScheduling.jl‎
Lines changed: 28 additions & 11 deletions
diff --git a/‎src/DynamicVehicleScheduling/abstract_policy.jl‎
Lines changed: 0 additions & 5 deletions b/‎src/DynamicVehicleScheduling/abstract_policy.jl‎
Lines changed: 0 additions & 5 deletions
diff --git a/‎src/DynamicVehicleScheduling/environment/environment.jl‎
Lines changed: 3 additions & 2 deletions b/‎src/DynamicVehicleScheduling/environment/environment.jl‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎src/DynamicVehicleScheduling/environment/instance.jl‎
Lines changed: 0 additions & 8 deletions b/‎src/DynamicVehicleScheduling/environment/instance.jl‎
Lines changed: 0 additions & 8 deletions
@@ -13,6 +13,7 @@ Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
 Graphs = "86223c79-3864-5bf0-83f7-82e725a168b6"
 HiGHS = "87dc4568-4c63-4d18-b0c0-bb2238e4078b"
 Images = "916415d5-f1e6-5110-898d-aaa5f9f070e0"
+InferOpt = "4846b161-c94e-4150-8dac-c7ae193c601f"
 Ipopt = "b6b21f68-93f8-5de0-b562-5493be1d77c9"
 IterTools = "c8e1da08-722c-5040-9ed9-7db0dc04731e"
 JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
@@ -40,6 +41,7 @@ Flux = "0.14, 0.15, 0.16"
 Graphs = "1.11"
 HiGHS = "1.9"
 Images = "0.26.1"
+InferOpt = "0.7.0"
 Ipopt = "1.6"
 IterTools = "1.10.0"
 JSON = "0.21.4"
 
@@ -0,0 +1,155 @@
+```@meta
+EditURL = "tutorials/warcraft.jl"
+```
+
+# Path-finding on image maps
+
+In this tutorial, we showcase DecisionFocusedLearningBenchmarks.jl capabilities on one of its main benchmarks: the Warcraft benchmark.
+This benchmark problem is a simple path-finding problem where the goal is to find the shortest path between the top left and bottom right corners of a given image map.
+The map is represented as a 2D image representing a 12x12 grid, each cell having an unknown travel cost depending on the terrain type.
+
+First, let's load the package and create a benchmark object as follows:
+
+````@example warcraft
+using DecisionFocusedLearningBenchmarks
+b = WarcraftBenchmark()
+````
+
+## Dataset generation
+
+These benchmark objects behave as generators that can generate various needed elements in order to build an algorithm to tackle the problem.
+First of all, all benchmarks are capable of generating datasets as needed, using the [`generate_dataset`](@ref) method.
+This method takes as input the benchmark object for which the dataset is to be generated, and a second argument specifying the number of samples to generate:
+
+````@example warcraft
+dataset = generate_dataset(b, 50);
+nothing #hide
+````
+
+We obtain a vector of [`DataSample`](@ref) objects, containing all needed data for the problem.
+Subdatasets can be created through regular slicing:
+
+````@example warcraft
+train_dataset, test_dataset = dataset[1:45], dataset[46:50]
+````
+
+And getting an individual sample will return a [`DataSample`](@ref) with four fields: `x`, `instance`, `θ`, and `y`:
+
+````@example warcraft
+sample = test_dataset[1]
+````
+
+`x` correspond to the input features, i.e. the input image (3D array) in the Warcraft benchmark case:
+
+````@example warcraft
+x = sample.x
+````
+
+`θ_true` correspond to the true unknown terrain weights. We use the opposite of the true weights in order to formulate the optimization problem as a maximization problem:
+
+````@example warcraft
+θ_true = sample.θ_true
+````
+
+`y_true` correspond to the optimal shortest path, encoded as a binary matrix:
+
+````@example warcraft
+y_true = sample.y_true
+````
+
+`instance` is not used in this benchmark, therefore set to nothing:
+
+````@example warcraft
+isnothing(sample.instance)
+````
+
+For some benchmarks, we provide the following plotting method [`plot_data`](@ref) to visualize the data:
+
+````@example warcraft
+plot_data(b, sample)
+````
+
+We can see here the terrain image, the true terrain weights, and the true shortest path avoiding the high cost cells.
+
+## Building a pipeline
+
+DecisionFocusedLearningBenchmarks also provides methods to build an hybrid machine learning and combinatorial optimization pipeline for the benchmark.
+First, the [`generate_statistical_model`](@ref) method generates a machine learning predictor to predict cell weights from the input image:
+
+````@example warcraft
+model = generate_statistical_model(b)
+````
+
+In the case of the Warcraft benchmark, the model is a convolutional neural network built using the Flux.jl package.
+
+````@example warcraft
+θ = model(x)
+````
+
+Note that the model is not trained yet, and its parameters are randomly initialized.
+
+Finally, the [`generate_maximizer`](@ref) method can be used to generate a combinatorial optimization algorithm that takes the predicted cell weights as input and returns the corresponding shortest path:
+
+````@example warcraft
+maximizer = generate_maximizer(b; dijkstra=true)
+````
+
+In the case o fthe Warcraft benchmark, the method has an additional keyword argument to chose the algorithm to use: Dijkstra's algorithm or Bellman-Ford algorithm.
+
+````@example warcraft
+y = maximizer(θ)
+````
+
+As we can see, currently the pipeline predicts random noise as cell weights, and therefore the maximizer returns a straight line path.
+
+````@example warcraft
+plot_data(b, DataSample(; x, θ_true=θ, y_true=y))
+````
+
+We can evaluate the current pipeline performance using the optimality gap metric:
+
+````@example warcraft
+starting_gap = compute_gap(b, test_dataset, model, maximizer)
+````
+
+## Using a learning algorithm
+
+We can now train the model using the InferOpt.jl package:
+
+````@example warcraft
+using InferOpt
+using Flux
+using Plots
+
+perturbed_maximizer = PerturbedMultiplicative(maximizer; ε=0.2, nb_samples=100)
+loss = FenchelYoungLoss(perturbed_maximizer)
+
+starting_gap = compute_gap(b, test_dataset, model, maximizer)
+
+opt_state = Flux.setup(Adam(1e-3), model)
+loss_history = Float64[]
+for epoch in 1:50
+    val, grads = Flux.withgradient(model) do m
+        sum(loss(m(x), y_true) for (; x, y_true) in train_dataset) / length(train_dataset)
+    end
+    Flux.update!(opt_state, model, grads[1])
+    push!(loss_history, val)
+end
+
+plot(loss_history; xlabel="Epoch", ylabel="Loss", title="Training loss")
+````
+
+````@example warcraft
+final_gap = compute_gap(b, test_dataset, model, maximizer)
+````
+
+````@example warcraft
+θ = model(x)
+y = maximizer(θ)
+plot_data(b, DataSample(; x, θ_true=θ, y_true=y))
+````
+
+---
+
+*This page was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*
+
@@ -5,11 +5,10 @@ using ..Utils
 using Base: @kwdef
 using CommonRLInterface: CommonRLInterface, AbstractEnv, reset!, terminated, observe, act!
 using DataDeps: @datadep_str
-# using ChainRulesCore
 using DocStringExtensions: TYPEDEF, TYPEDFIELDS, TYPEDSIGNATURES
 using Graphs
 using HiGHS
-# using InferOpt
+using InferOpt: LinearMaximizer
 using IterTools: partition
 using JSON
 using JuMP
@@ -21,8 +20,6 @@ using Statistics: mean, quantile
 
 include("utils.jl")
 
-include("abstract_policy.jl")
-
 # static vsp stuff
 include("static_vsp/instance.jl")
 include("static_vsp/parsing.jl")
@@ -41,20 +38,40 @@ include("algorithms/anticipative_solver.jl")
 
 include("learning/features.jl")
 include("learning/2d_features.jl")
-include("learning/dataset.jl")
 
 include("policy/abstract_vsp_policy.jl")
 include("policy/greedy_policy.jl")
 include("policy/lazy_policy.jl")
 include("policy/anticipative_policy.jl")
 include("policy/kleopatra_policy.jl")
 
-struct DVSPBenchmark <: AbstractDynamicBenchmark end
+include("maximizer.jl")
+
+"""
+$TYPEDEF
+
+Abstract type for dynamic vehicle scheduling benchmarks.
+"""
+@kwdef struct DVSPBenchmark <: AbstractDynamicBenchmark
+    max_requests_per_epoch::Int = 10
+    Δ_dispatch::Float64 = 1.0
+    epoch_duration::Float64 = 1.0
+end
 
-function Utils.generate_sample(b::DVSPBenchmark, rng::AbstractRNG)
-    return DataSample(;
-        instance=Instance(read_vsp_instance(readdir(datadep"dvrptw"; join=true)[1]))
-    )
+function Utils.generate_dataset(b::DVSPBenchmark, dataset_size::Int=1)
+    (; max_requests_per_epoch, Δ_dispatch, epoch_duration) = b
+    files = readdir(datadep"dvrptw"; join=true)
+    dataset_size = min(dataset_size, length(files))
+    return [
+        DataSample(;
+            instance=Instance(
+                read_vsp_instance(files[i]);
+                max_requests_per_epoch,
+                Δ_dispatch,
+                epoch_duration,
+            ),
+        ) for i in 1:dataset_size
+    ]
 end
 
 function Utils.generate_scenario_generator(::DVSPBenchmark)
@@ -70,7 +87,7 @@ function Utils.generate_environment(::DVSPBenchmark, instance::Instance; kwargs.
 end
 
 function Utils.generate_maximizer(::DVSPBenchmark)
-    return prize_collecting_vsp
+    return LinearMaximizer(oracle; g, h)
 end
 
 export DVSPBenchmark #, generate_environment # , generate_sample, generate_anticipative_solver
 
@@ -45,12 +45,13 @@ $TYPEDSIGNATURES
 Get the planning start time of the environment, i.e. the time at which vehicles routes dispatched in current epoch can depart.
 """
 planning_start_time(env::DVSPEnv) = time(env) + Δ_dispatch(env)
+
 """
 $TYPEDSIGNATURES
 
 Check if the episode is terminated, i.e. if the current epoch is the last one.
 """
-CommonRLInterface.terminated(env::DVSPEnv) = current_epoch(env) >= last_epoch(env)
+CommonRLInterface.terminated(env::DVSPEnv) = current_epoch(env) > last_epoch(env)
 
 """
 $TYPEDSIGNATURES
@@ -69,7 +70,7 @@ remove dispatched customers, advance time, and add new requests to the environme
 function CommonRLInterface.act!(env::DVSPEnv, routes, scenario=env.scenario)
     reward = -apply_routes!(env.state, routes)
     env.state.current_epoch += 1
-    if current_epoch(env) <= last_epoch(env)
+    if !CommonRLInterface.terminated(env)
         add_new_customers!(env.state, env.instance; scenario[current_epoch(env)]...)
     end
     return reward
 
@@ -14,8 +14,6 @@ Instance data structure for the dynamic vehicle scheduling problem.
     epoch_duration::T = 1.0
     "last epoch index"
     last_epoch::Int
-    # "seed for customer sampling"
-    # seed::S
 end
 
 function Instance(
@@ -44,9 +42,3 @@ end
 epoch_duration(instance::Instance) = instance.epoch_duration
 last_epoch(instance::Instance) = instance.last_epoch
 max_requests_per_epoch(instance::Instance) = instance.max_requests_per_epoch
-# static_instance(instance::Instance) = instance.static_instance
-
-# duration(instance::Instance) = duration(instance.static_instance)
-# service_time(instance::Instance) = service_time(instance.static_instance)
-# coordinate(instance::Instance) = coordinate(instance.static_instance)
-# start_time(instance::Instance) = start_time(instance.static_instance)