Pythonically facilitate laborious file management, distributed computing, scripting and deep learning workflows.
croshell_distributed_computing_demo.mp4
With this cluster management tool, old laptops and desktops are endowed second lives as cluster workhorses to which you offset heavy computations from your little personal laptop.
- π§βπ» Repository containing the executed function is automatically zipped and copied to each remote.
- π Binary data is
sftp'd automatically to each remote. - π½π»πΏ Specs of each resource is inspected and workload is distributed accordingly on the cluster.
- π¨π© Email Notifications. Get optionally notified about start and finish of your submitted jobs.
- ππ Resources locking. A Job can optionally hold the resources to itself and other submitted jobs will have to wait.
- πββοΈπββ οΈThis feature enable sending aribtrary number of jobs in one go and never worry about overwhelming the remote. Then you come later and get all results.
- Zellij session with reasonable layout is fired automatically on each remote.
Croshell aims at facilitating the use of Python in scripting, thus, offering an alternative to PowerShell & Bash which have absurdly complex commands that are nothing but jumble of ad-hoc developments piled over decades to save some programmers a key stroke or two. This heritage poses huge burden on the people coming into the computer science field. A full rant bashing those shells by Brian Will is here.
The core rationale is:
- No one has the time to listen to hours long tutorials on how powerful and versatile
lsorgrepare, let alone keeping the random syntax in mind (unless used on daily basis). - Python shell on the other hand, offers benign syntax and eminent readibility but it comes at the rather hefty cost of terseness, or the lack of it. For example, to make up for just
ls, you need to import some libraries and it will eventually set you back a couple of lines of code. That's not acceptable for the simple task of listing directory contents, let alone a task of compressing a directory. - Crocodile comes here to make Python terser and friendlier by offering functionality for everyday use, like file management, SSH, environment variables management, etc. In essence, croshell to IPython is what IPython to Python shell is; that is, the basic Python shell that can only do arithmetic is turbo-boosted making it perfect for everyday errands.
- The library, if used in coding, will fill your life with one-liners, take your code to artistic level of brevity and readability while simultaneously being more productive by typing less boilerplate lines of code that are needless to say.
The name crocodile signifies the use of brute force in its implementation. The focus is on ease of use, as oppoesd to beating the existing shells in speed.
Mind you, speed is not an issue in 99% of everyday chores.
Crocodile designed carefully to be loved, learning curve cound't be flattened further.
This package extends many native Python classes to equip you with an uneasy-to-tame power. The major classes extended are:
pathlib.Pathis extended toP- Forget about importing all the archaic Python libraries
os,glob,shutil,sys,zipfileetc.Pmakes the path an object, not a lame string.Pobjects are incredibly powerful for parsing paths, no more than one line of code is required to do any operation. Take a squint at this one line file wrangler:- get a temporary file name
- writes
loltext to it - copy it to same location (with a suffix like
_copy1) - moves it to parent directory
- converts user home to
~ - zip it
- delete it
- touch it
- go to its parent
- search for all files in it and select the first one.
- upload it to the cloud (transfer.sh)
- open the browser with the url
- download it (by default it goes to
~/Downloads) - encrypt it with a password.
- create a symlink to it from
~/toy - resolve the symbolic link
- calculate the checksum of the file
- Forget about importing all the archaic Python libraries
P.tmpfile().write_text("lol").copy().move("..", rel2it=True).collapseuser().zip().delete(sure=True).touch().parent.search("*", folders=False)[0].share_on_cloud()().download().encrypt(pwd="haha").symlink_from("~/toy").resolve().checksum()path = P("dataset/type1/meta/images/file3.ext")
>> path[0] # allows indexing! makes sense, hah?
P("dataset")
>> path[-1] # nifty!
P("file3.ext")
>> path[2:-1] # even slicing!
P("meta/images/file3.ext")-
listis extended toList- Forget that
forloops exist, because with this class,forloops are implicitly used to apply a function to all items. Inevitably while programming, one will encounter objects of the same type and you will be struggling to get a tough grab on them.Listis a powerful structure that put at your disposal a grip, so tough, that the objects you have at hand start behaving like one object. Behaviour is ala-JavaScript implementation offorEachmethod of Arrays.
- Forget that
-
dictis extended toStruct.- Combines the power of dot notation like classes and key access like dictionaries.
-
Additionally, the package provides many other new classes, e.g.
ReadandSave. Together withP, they provide comprehensive support for file management. Life cannot get easier with those. Every class inherits attributes that allow saving and loading in one line.
Furthermore, those classes are inextricably connected. For example, globbing a path P object returns a List object. You can move back and forth between List and Struct and DataFrame with one method, and so on.
- Deep Learning Modules.
- A paradigm that facilitates working with deep learning models that is based on a tri-partite scheme:
- HyperParameters: facilitated through
HParamsclass. - Data: facilitated though
DataReaderclass. BaseModelis a frontend for bothTensorFlow&Pytorchbackends. The wrapper worked in tandem.
- HyperParameters: facilitated through
- The aforementioned classes cooperate together to offer sealmess workflow during creation, training, and saving models.
- A paradigm that facilitates working with deep learning models that is based on a tri-partite scheme:
In the commandline:
pip install crocodile.
Being a thin extension on top of almost pure Python, you need to worry not about your venv, the package is not aggressive in requirements, it installs itself peacefully, never interfere with your other packages. If you do not have numpy, matplotlib and pandas, it simply throws ImportError at runtime, that's it.
For Windows machines, run the following in elevated PowerShell:
Warning: This includes dotfiles manager that you might not want.
Invoke-WebRequest https://raw.githubusercontent.com/thisismygitrepo/machineconfig/main/src/machineconfig/setup_windows/croshell.ps1 | Invoke-ExpressionThat's as easy as taking candy from a baby; whenever you start a Python file, preface it with following in order to unleash the library:
EX1: Get a list of .exe available in terminal.
P.get_env().PATH.search('*.exe').reduce(lambda x, y: x+y).print()EX2: Suppose you want to know how many lines of code in your repository. The procedure is to glob all .py files recursively, read string code, split each one of them by lines, count the lines, add up everything from all strings of code.
To achieve this, all you need is an eminently readable one-liner.
P.cwd().search("*.py", r=True).read_text().split('\n').apply(len).to_numpy().sum()How does this make perfect sense?
searchreturnsListofPpath objectsread_textis aPmethod, but it is being run againstListobject. Behind the scenes, responsible black magic fails to find such a method inListand realizes it is a method of items inside the list, so it runs it against them and thus read all files and containerize them in anotherListobject and returns it.- A similar story applies to
splitwhich is a method of strings in Python. - Next,
applyis a method ofList. Sure enough, it lives up to its apt name and applies the passed functionlento all items in the list and returns anotherListobject that contains the results. .to_numpy()convertsListtonumpyarray, then.sumis a method ofnumpy, which gives the final result.
Methods naming convention like apply and to_numpy are inspired from the popular pandas library, resulting in almost non-existing learning curve.
Please refer to Here on the main git repo.
Click Here
Alex Al-Saffar. email