Skip to content

improve parallelization #26

@swhalemwo

Description

@swhalemwo

At the moment, calling stata with mclapply does not work well, for example the following command does not finish:

library(RStata)
library(parallel)
library(dplyr)

options(RStata.StataPath = "/usr/local/stata14/stata")
options(RStata.StataVersion = 14)

'%!in%' <- function(x,y)!('%in%'(x,y))


stata_test <- function(x) {

    pid <- as.character(Sys.getpid()) %>% substring(nchar(.)-1, nchar(.))
    print(paste0(pid, "--", x))
    
    stata("ds", data.in = ToothGrowth, stata.echo = F)

}
    
mclapply(seq(40), stata_test, mc.cores = 8)
Results
[1] "45--1"
[1] "46--2"
[1] "51--3"
[1] "54--4"
[1] "55--5"
[1] "56--6"
[1] "57--7"
[1] "64--8"
[1] "45--9"
[1] "46--10"
[1] "56--14"
[1] "57--15"
[1] "51--11"
[1] "54--12"
[1] "55--13"
[1] "45--17"
[1] "57--23"
[1] "56--22"
[1] "51--19"
[1] "45--25"
[1] "56--30"
[1] "57--31"
[1] "51--27"
[1] "45--33"
[1] "57--39"
[1] "56--38"
[1] "51--35"

I think this is due to stata relying on local files with unlink (maybe since then multiple processes try to read and write the same file, which results in some processes (e.g. 46 and 64) not finishing). A slightly modified version with a with a separate directory for each process works fine:

stata_test2 <- function(x) {
        
    pid <- Sys.getpid()
    cur_wd <- getwd()
    new_dir <- paste0(cur_wd, "/", pid)

    present_dirs <- list.dirs(paste0(cur_wd), recursive = F)

    if (new_dir %!in% present_dirs) {
        mkdir_cmd <- paste0("mkdir ", new_dir)
        system(mkdir_cmd)
    }

    setwd(new_dir)
    
    print(paste0(pid, "--", x))
    stata("ds", data.in = ToothGrowth, stata.echo = F)
    
    setwd(cur_wd)

}
    

mclapply(seq(40), stata_test2, mc.cores = 8)

This approach works for my purposes now, but maybe it can be integrated (more elegantly) into Rstata directly?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions