Skip to content
Adam Moody edited this page Oct 11, 2013 · 9 revisions

Features:

  • long running jobs: report progress to user, continue where left off after interruption (checkpoint/restart) and provide common method to halt job
  • invoke standard linux tools where possible, e.g., grep
  • parallel techniques: master/worker, distributed queue, distributed task graph
  • define common file formats for input / output between tools

Components:

  • posix i/o wrappers to retry on non-fatal errors (e.g., EINTR)
  • component to manipulate paths (e.g., basename, dirname, transform /a/b/../c// into /a/c)
  • abstraction for file meta data (stat data) to access fields and transfer between procs
  • API to read / write file meta data structures to files
  • API to filter and sort file meta data structures
  • parallel directory walk
  • parallel pipe from one tool to another

Tools:

  • list
  • find
  • copy
  • rsync
  • remove
  • tar/zip
  • grep
  • compare

Initial file systems to integrate with

  • Lustre
  • Panasas
  • GPFS
  • NFS

Initial middle-ware to integrate with

  • PLFS
  • SCR
  • ADIOS

Tips

Reading through the tar code today to see how it handles xattrs and came across this as an answer to the sub-second timestamps... tar uses functions like get_stat_atime() defined in stat-time.h to fetch the timestamp from a stat structure: http://www.gnu.org/software/gnulib/coverage/gllib/stat-time.h.gcov.frameset.html Then it uses utimensat() to set the timestamps.

Overview:

The github.com/hpc URL is a github "organization", which is a grouping of related projects, one of which is bayer. We created the bayer project just for this collaboration. Most of the projects under the hpc oranization are open, so that anyone can access them, but we created bayer to be private until we release it. The github.com/hpc/bayer URL is the main page for the bayer project.

The dcp code lives outside of bayer as its own github/hpc project, because dcp existed before we started the bayer effort. That's the same story with libcircle, dtcmp, and lwgrp -- all of those are components that we're using within bayer but they existed before we started our collaboration.

The libbayer library is where we can share common code between tools, e.g., "reliable" POSIX IO calls, memory allocation routines, certain canned uses of libcircle, and the like. Whenever there is a routine that more than one tool can use, let's keep that routine in libbayer.

So the whole picture looks something like this:


projects under github.com/hpc organization

lwgrp: light-weight group library

  • implements collectives using light-weight representations of MPI communicators

dtcmp: datatype comparison library

  • implements parallel sort algorithms
  • uses: lwgrp

libcircle: load balancing library

dcp: original parallel copy tool

  • uses: libcircle
  • now includes a "bayer" branch that uses libbayer

bayer: parallel file system tools

  • libbayer: common library available to all tools
    • uses: libcircle, dtcmp, lwgrp
  • tools (so far):
    • dwalk - parallel list
    • drm - parallel remove
    • dtar - parallel tar
  • buildme scripts: commands to build libbayer and the tools (including dcp)
Clone this wiki locally