Using column generation in constraint-and-column generation for adjustable robust optimization > JSP
Loading the data
Our results can be found in the results.jsp.csv
file
with the following columns:
- “tag”: a tag always equal to “result” used grep the result line in our execution log file.
- “instance”: the path to the instance.
- “standard_phase_time_limit”: the time limit for the standard phase (i.e., without using CG).
- “master_solver”: the solver used for solving the CCG master problem: STD for standard, i.e., Gurobi, CG for column generation.
- “status”: the final status.
- “reason”: the final status reason.
- “has_large_scaled”: true if the CG phase has been started, false otherwise.
- “n_iterations”: the number of iterations.
- “total_time”: the total time to solve the problem.
- “master_time”: the time spent solving the master problem.
- “adversarial_time”: the time spent solving the adversarial problem.
- “best_bound”: the best bound found.
- “best_obj”: the best feasible point value.
- “relative_gap”: the final relative gap.
- “absolute_gap”: the final absolute gap.
- “adversarial_unexpected_status”: the status of the adversarial problem solver if it is not Optimal.
- “with_heuristic”: true if the CG-based heuristic is used.
- “with_non_optimal_pricing”: always false.
- “n_jobs”: the number of jobs in the instance.
- “Gamma”: the value for the uncertainty budget \(\Gamma\).
- “blank”: this column is left blank.
data = read.csv("results.csv", header = FALSE)
colnames(data) = c("slurm_file", "tag", "instance", "standard_phase_time_limit_raw", "Gamma", "with_heuristic", "standard_phase_time_limit", "master_solver", "status", "reason", "has_large_scaled", "n_iterations", "total_time", "master_time", "adversarial_time", "best_bound", "best_obj", "relative_gap", "absolute_gap", "second_stage.mean", "second_stage.std_dev", "adversarial_unexpected_status", "memory_used", "memory_limit")
data = data %>% mutate(instance = basename(instance),
n_jobs = as.numeric(sub(".*N(\\d+)_R.*", "\\1", instance))
)
We start by removing the “tag” and the “blank” columns.
data = data[, !(names(data) %in% c("tag", "blank"))]
For homogeneity, we fix the total_time of unsolved instances to the time limit.
if (sum(data$tag == "iteration") > 0 ) {
data[data$tag == "iteration",]$total_time = 10800
}
if (sum(data$total_time > 10800) > 0 ) {
data[data$total_time > 10800,]$total_time = 10800
}
Then, we create a column named “method” which gives a specific name to each method, comprising the approach for solving the CCG master problem, the time limit of the standard phase and a flag indicating if the CG-based heuristic was used.
data$method = paste0(data$master_solver, "_TL", data$standard_phase_time_limit, "_H", data$with_heuristic)
unique(data$method)
## [1] "CG_TL0_Htrue" "STD_TLInf_Htrue"
Our final data reads.
Sanity Check
# Define the relative tolerance
tolerance <- 10^-2
# Filter the data where time < 18000 and group by 'instance'
validation <- data %>%
filter(total_time < 18000) %>%
group_by(instance) %>%
summarise(min_best_obj = min(best_obj), max_best_obj = max(best_obj)) %>%
mutate(valid = (max_best_obj - min_best_obj) / min_best_obj <= tolerance)
# Check if all instances are valid
if (all(validation$valid)) {
print("All methods find the same best_obj value within the relative tolerance for all instances.")
} else {
print("Methods do not find the same best_obj value within the relative tolerance for some instances.")
print(validation %>% filter(!valid)) # Show the instances that failed validation
}
## [1] "All methods find the same best_obj value within the relative tolerance for all instances."
Empirical Cumulative Distribution Function (ECDF)
We plot the ECDF of computation time over our set of instances for all approaches.
ggplot(data, aes(x = total_time, col = method)) + stat_ecdf(pad = FALSE) +
coord_cartesian(xlim = c(0,10800)) +
#scale_x_log10() +
theme_minimal()
ggplot(data, aes(x = memory_used, col = method)) +
stat_ecdf(pad = FALSE) +
theme_minimal()
ggplot(data, aes(x = master_time / n_iterations, col = method)) + stat_ecdf(pad = FALSE) +
# coord_cartesian(xlim = c(0,18000)) +
theme_minimal()
We export these results in csv to print them in tikz.
data_with_ecdf = data %>%
group_by(method) %>%
arrange(total_time) %>%
mutate(ecdf_value = ecdf(total_time)(total_time)) %>%
ungroup()
for (method in unique(data_with_ecdf$method)) {
output = data_with_ecdf[data_with_ecdf$method == method,]
output = output[,c("total_time", "ecdf_value")]
output$log_total_time = log10(output$total_time)
output = output[output$total_time < 18000,]
write.csv(output, file = paste0(method, ".csv"), row.names = FALSE)
}
Summary table
In this section, we create a table summarizing the main outcome of our computational experiments.
We first focus on the solved instances.
summary_data_lt_18000 <- data %>%
filter(total_time < 18000) %>%
group_by(n_jobs, Gamma, method) %>%
summarize(
avg_total_time = mean(total_time, na.rm = TRUE),
avg_master_time = mean(master_time, na.rm = TRUE),
avg_adversarial_time = mean(adversarial_time, na.rm = TRUE),
avg_n_iterations = mean(n_iterations, na.rm = TRUE),
sum_has_large_scaled = sum(has_large_scaled),
num_lines = n(),
.groups = "drop"
) %>%
ungroup() %>%
arrange(n_jobs, Gamma, method)
Then, we compute averages over the unsolved instances.
summary_data_ge_18000 <- data %>%
filter(total_time >= 18000) %>%
group_by(n_jobs, Gamma, method) %>%
summarize(
avg_n_iterations_unsolved = mean(n_iterations, na.rm = TRUE),
num_lines_unsolved = n()
) %>%
ungroup() %>%
arrange(n_jobs, Gamma, method)
## `summarise()` has grouped output by 'n_jobs', 'Gamma'. You can override using
## the `.groups` argument.
Finally, we merge our results.
transposed_data_lt_18000 <- summary_data_lt_18000 %>%
pivot_wider(names_from = method, values_from = avg_total_time:num_lines)
transposed_data_ge_18000 <- summary_data_ge_18000 %>%
pivot_wider(names_from = method, values_from = avg_n_iterations_unsolved:num_lines_unsolved) %>%
select(-n_jobs, -Gamma)
transposed_data_lt_18000 %>% kable()
n_jobs | Gamma | avg_total_time_CG_TL0_Htrue | avg_total_time_STD_TLInf_Htrue | avg_master_time_CG_TL0_Htrue | avg_master_time_STD_TLInf_Htrue | avg_adversarial_time_CG_TL0_Htrue | avg_adversarial_time_STD_TLInf_Htrue | avg_n_iterations_CG_TL0_Htrue | avg_n_iterations_STD_TLInf_Htrue | sum_has_large_scaled_CG_TL0_Htrue | sum_has_large_scaled_STD_TLInf_Htrue | num_lines_CG_TL0_Htrue | num_lines_STD_TLInf_Htrue |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15 | 3 | 563.2457 | 346.8273 | 550.9291 | 214.5810 | 4.015344 | 6.794467 | 78.4500 | 144.2750 | 80 | 0 | 80 | 80 |
15 | 4 | 914.8133 | 433.6667 | 903.6672 | 263.6355 | 8.833595 | 14.665505 | 89.6750 | 164.3125 | 80 | 0 | 80 | 80 |
20 | 3 | 2198.3943 | 590.0528 | 2306.7129 | 397.6761 | 14.251847 | 40.854515 | 60.6250 | 132.6500 | 80 | 0 | 80 | 80 |
20 | 4 | 3177.2821 | 908.6789 | 3372.9363 | 608.1697 | 49.963325 | 99.334654 | 77.6625 | 169.6625 | 80 | 0 | 80 | 80 |
transposed_data_ge_18000 %>% kable()
#cbind(
# transposed_data_lt_18000,
# transposed_data_ge_18000
#) %>%
# kable() %>%
# kable_styling(full_width = FALSE, position = "center")
Second-stage Deviations
ggplot(data, aes(x = method, y = second_stage.std_dev / abs(second_stage.mean))) +
geom_boxplot() +
labs(title = "Boxplot of std.dev",
x = "Method",
y = "Number of Iterations") +
theme_minimal()
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
git push
action on the public repository
hlefebvr/hlefebvr.github.io
using rmarkdown and Github
Actions. This ensures the reproducibility of our data manipulation. The
last compilation was performed on the 20/11/24 21:13:32.