Skip to content

Consider support of IUPAC codes in matrixOfRandomBarcodes or convenience function to derive possible barcodes #4

@j-andrews7

Description

@j-andrews7

Rather than truly random synthesis, there are semi-random barcode libraries that adhere to a given structure (e.g. clonTracer). In such cases, the ability to provide constrained barcode templates could be convenient.

Of course, the user could just generate them themselves and feed them to choices in the appropriate function with something like:

library(Biostrings)

iupac_codes <- list(
  A = "A", C = "C", G = "G", T = "T", 
  R = c("A", "G"), Y = c("C", "T"), S = c("G", "C"), 
  W = c("A", "T"), K = c("G", "T"), M = c("A", "C"), 
  B = c("C", "G", "T"), D = c("A", "G", "T"), H = c("A", "C", "T"), 
  V = c("A", "C", "G"), N = c("A", "C", "G", "T")
)

generate_possible_barcodes <- function(dna_string) {
  
  expand_char <- function(char) {
    iupac_codes[[toupper(char)]]
  }

  chars <- strsplit(dna_string, split = "")[[1]]
  possible_sequences_list <- lapply(chars, expand_char)
  sequences <- expand.grid(possible_sequences_list, stringsAsFactors = FALSE)

  sequences_str <- do.call(paste0, sequences)

  return(sequences_str)
}

sequences <- generate_possible_barcodes("ACTGWSWSWSAA")
print(sequences)
 [1] "ACTGAGAGAGAA" "ACTGTGAGAGAA" "ACTGACAGAGAA" "ACTGTCAGAGAA" "ACTGAGTGAGAA" "ACTGTGTGAGAA" "ACTGACTGAGAA" "ACTGTCTGAGAA" "ACTGAGACAGAA"
[10] "ACTGTGACAGAA" "ACTGACACAGAA" "ACTGTCACAGAA" "ACTGAGTCAGAA" "ACTGTGTCAGAA" "ACTGACTCAGAA" "ACTGTCTCAGAA" "ACTGAGAGTGAA" "ACTGTGAGTGAA"
[19] "ACTGACAGTGAA" "ACTGTCAGTGAA" "ACTGAGTGTGAA" "ACTGTGTGTGAA" "ACTGACTGTGAA" "ACTGTCTGTGAA" "ACTGAGACTGAA" "ACTGTGACTGAA" "ACTGACACTGAA"
[28] "ACTGTCACTGAA" "ACTGAGTCTGAA" "ACTGTGTCTGAA" "ACTGACTCTGAA" "ACTGTCTCTGAA" "ACTGAGAGACAA" "ACTGTGAGACAA" "ACTGACAGACAA" "ACTGTCAGACAA"
[37] "ACTGAGTGACAA" "ACTGTGTGACAA" "ACTGACTGACAA" "ACTGTCTGACAA" "ACTGAGACACAA" "ACTGTGACACAA" "ACTGACACACAA" "ACTGTCACACAA" "ACTGAGTCACAA"
[46] "ACTGTGTCACAA" "ACTGACTCACAA" "ACTGTCTCACAA" "ACTGAGAGTCAA" "ACTGTGAGTCAA" "ACTGACAGTCAA" "ACTGTCAGTCAA" "ACTGAGTGTCAA" "ACTGTGTGTCAA"
[55] "ACTGACTGTCAA" "ACTGTCTGTCAA" "ACTGAGACTCAA" "ACTGTGACTCAA" "ACTGACACTCAA" "ACTGTCACTCAA" "ACTGAGTCTCAA" "ACTGTGTCTCAA" "ACTGACTCTCAA"
[64] "ACTGTCTCTCAA"

But what's the fun in that, really? Probably simpler ways to do it, I am not great with Biostrings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions