Skip to content

PimLb/control_1D_crawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Control a 1D crawler

Source code used to produce the results discussed in the paper <paper reference when published>, and preprint <arXive preprint>.

Training, storing and producing stats of learned policies

Run the command python getLearningStats.py <input.prm>. The input file contains all the relevant information to set up the simulator (e.g. crawler length, simulation box size or number of suckers, ecc..) and information useful for setting up the learning campain (scheduling, and "exploration" phase parameters). Some useful parameters are hard coded into the script itself. Important parameters to monitor are the initial number of steps per episode and the steps increment at each failed convergence attempts (where the number of attempts is also fixed by the "max_attempts" parameter).

At the end of the integration a number of outputs are produced and some can be further analysed with the routines provided in the "utilities" folder. For instance a file containing the best observed policy is stored as a pickle dictionary which can be loaded and run.

Main Objects and functions

The best practice to understand how to use the code is to go through "getLearningStats.py" and its comments. The general logic is that an "Environment" object is created wich is the simulator comprising the 1D crawler with all its assigned parameters. Then an "actionValue" object can be created, which is basically the Q matrix used for learning. The "actionValue" object contains a number of methods that can be used to analyse a policy and policies can be loaded by hand, for instance using a saved policy, for analysis purposes without need to (re)learn it. Also, after learning the "actionValue" object will contain several saved policies (it stores a number of sub-optimal policy as explained in the paper). Each of them an be accessed and played to gather statistics and performance measures. This is well illustrated by the "getLearningStats.py" script. Other examples of usage of learned policies and various type of analyis that can be run on them are found in the scripts contained in the "utilities" folder. These analysis scripts call functions contained in the "analysis_utilities.py" script. For instance, an useful analysis tool contained in "analysis_utilities.py" is the "getPolicyStats()" function which measures the performance of the stored policies and calls among other functions the "countPolicies()" function establishing which are the unique policies found. Another important tool used in the paper is the function "policyRobustnessStudy()" that takes in input a list of policies. The list of policies can be fabricated by calling the routine "policyImporter()" which take as argument the folder name contining all the policies that one wants to compare. In the paper we used the best policies found for each architecture, which is a standard output of the "getLearningStats.py" script (one call per architecture, using the dedicated input file for instance as those provided in the "examples" folder). The function "policyRobustnessStudy()" uses "robustnessAnalysis()" as core fucntion. This functuon encompasses a number of methods to assess the robustness of a policy over sucker failures, including those used in the paper.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%