plancherb1
diff --git a/‎DDPHelpers.cuh‎
Lines changed: 0 additions & 52 deletions b/‎DDPHelpers.cuh‎
Lines changed: 0 additions & 52 deletions
diff --git a/‎DDPHelpers/DDPWrappers.cuh‎
Lines changed: 160 additions & 16 deletions b/‎DDPHelpers/DDPWrappers.cuh‎
Lines changed: 160 additions & 16 deletions
diff --git a/‎DDPHelpers/LCMHelpers.cuh‎
Lines changed: 575 additions & 0 deletions b/‎DDPHelpers/LCMHelpers.cuh‎
Lines changed: 575 additions & 0 deletions
diff --git a/‎DDPHelpers/MPCHelpers.cuh‎
Lines changed: 534 additions & 459 deletions b/‎DDPHelpers/MPCHelpers.cuh‎
Lines changed: 534 additions & 459 deletions
diff --git a/‎DDPHelpers/bpHelpers.cuh‎
Lines changed: 2 additions & 2 deletions b/‎DDPHelpers/bpHelpers.cuh‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎DDPHelpers/fpHelpers.cuh‎
Lines changed: 238 additions & 89 deletions b/‎DDPHelpers/fpHelpers.cuh‎
Lines changed: 238 additions & 89 deletions
diff --git a/‎DDPHelpers/nisInitHelpers.cuh‎
Lines changed: 173 additions & 84 deletions b/‎DDPHelpers/nisInitHelpers.cuh‎
Lines changed: 173 additions & 84 deletions
diff --git a/‎README.md‎
Lines changed: 33 additions & 14 deletions b/‎README.md‎
Lines changed: 33 additions & 14 deletions
@@ -306,7 +306,7 @@ void computeFSVars(T *b_ApBK, T *b_Bdu, T *s_AB, T *s_K, T *s_du, int ld_A){
 				// multiply row kx of B by column ky of du store in (kx,ky) of d
 				val += s_AB[OFFSET_B + kx + DIM_AB_r*j]*s_du[j];
 			}
-			b_Bdu[kx] = -val;
+			b_Bdu[kx] = val;
 		}
 	}
 }
@@ -327,7 +327,7 @@ void computeExpRed(T *s_dJ, T *s_H, T *s_g, T *s_du){
       		s_dJ[ind] 			 += val1;
       		s_dJ[DIM_du_r + ind] += val2;
   		#else
-  			s_dJ[0] += val1; 
+  			  s_dJ[0] += val1; 
       		s_dJ[1] += val2;
   		#endif
   	}
 
@@ -1,23 +1,42 @@
 # Parallel DDP for GPU/CPU #
 
-This repository holds the code supporting "[A Performance Analysis of Differential Dynamic Programming on a GPU](https://agile.seas.harvard.edu/publications/performance-analysis-parallel-differential-dynamic-programming-gpu)." It is also where experimental work is being done to extend this work.
+### Releases
+**```v0.1: WAFR 2018 Release```**
+holds the code supporting "[A Performance Analysis of Differential Dynamic Programming on a GPU](https://agile.seas.harvard.edu/publications/performance-analysis-parallel-differential-dynamic-programming-gpu)."
+
+**```v0.2: ICRA 2019 Release```** extends the previous work by integrating LCM for hardware experiments and cleaning up the code base / interface. <!--An extended abstract describing the hardware experiments can be found [here]().-->
 
 ### Stucture of this Repository
-* ```WAFR_MPC_examples.cu``` and ```WAFR_MPC_examples.cu``` are the main files which run the experiments from the paper (see the comment at the top of each file for the compilation instructions)
-* ```config.h``` defines all of the settings (parallel level, plant, etc.) for an experiment
-* ```DDPHelpers.cuh``` imports all of the various helper functions and files from the following folders as needed
-* ```/DDPHelpers/*``` holds most of the functions as inlined templated CUDA header files
+* ```config.h``` defines all of the default settings (parallel level, plant, etc.) for an experiment and imports all of the various helper functions and files from the following folders as needed
+* ```/examples/*``` holds the scripts that run the WAFR examples and LCM examples (see the comment at the top of each file for the compilation instructions)
 * ```/plants/*``` holds custom rigid body dynamics and/or analytical dynamics and cost functions for currently supported plants
+* ```/DDPHelpers/*``` holds most of the functions for DDP as inlined templated CUDA header files
 * ```/utils/*``` holds a variety of support code for matrix multiplication, discrete time integrators, thread/CUDA support, etc.
-* ```/lcmtypes/*``` holds experimental LCM types for multi-computer / hardware communication
+* ```/test/*``` holds a variety of testing scripts for various function calls and derivatives (see the comment at the top of each file for the compilation instructions)
+* ```/lcmtypes/*``` holds LCM types for multi-computer / hardware communication
 
 ### Dependencies
-* [CUDA](https://developer.nvidia.com/cuda-zone) needs to be installed as code needs to be compiled with the NVCC comiler
-* For experimental multi-computer / hardware MPC code there is an additional communicaiton dependency: [LCM](https://lcm-proj.github.io/).
+* [CUDA](https://developer.nvidia.com/cuda-zone) needs to be installed as code needs to be compiled with the NVCC comiler. Currently, this code has been tested with CUDA 9 and X.
+* For multi-computer / hardware MPC code there is an additional communicaiton dependency: [LCM](https://lcm-proj.github.io/).
+
+### Instalation Tips for CUDA
+https://askubuntu.com/questions/1077061/how-do-i-install-nvidia-and-cuda-drivers-into-ubuntu
+https://www.tensorflow.org/install/gpu
+
+### To use with the Drake Kuka Simulator
+1) Install this fork of drake: [https://github.com/plancherb1/drake](https://github.com/plancherb1/drake)
+2) You need to put in you .bashrc ```export DRAKE_PATH_ROOT=<path_to_drake>```
+Then the scripts in the utils folder should launch the drake visualizer and simulator
 
-### Known Bugs / Ongoing Development
-* Finite-Diff Derivatives are currently only partially implemented (and broken)
-* Small float rounding differences need to be investigated further (probably benign)
-* GPU RBDYN for Kuka only works in Euler mode -- need to introduce loops and reduce shared memory for Midpoint and RK3 (or use a brand new GPU which has double the shared memory)
-* LCM infrastructure for multi-computer / hardware MPC only partially developed (and currenlty commented out)
-* Need to catch up the CPU MPC to GPU MPC implementation
+### Known Bugs / Ongoing Development / Roadmap
+* On roadmap to develop a CPU/GPU hybrid (only the gradients on the GPU) and a fully serial CPU version without any instruction level parallelism
+* GPU RBDYN for Kuka only works in Euler mode -- need to introduce loops and reduce shared memory for Midpoint and RK3 (or use a brand new GPU which has double the shared memory) -- potential to also optimize the gradient calc to require less shared memory
+* CPU MPC suffers from resource contention when trajRunner and Goal are on same computer -- need to improve and provide seperate compile paths -- also CPU MPC Parallel Line Search has a subtle bug (in iLQR is identical to serial but diverges in MPC -- need to debug)
+* Constraint handling / penalities need further development - would like to add full AL constraints and/or projection methods
+* Final cost shift is in development and non-functional (tied to frequency and not last goal change / shift count)
+* SLQ implementation is currently broken (and EE version needs a cost kernel)
+* EEVel rpy derivatives are currently broken (may explore forced finite diff)
+* BFGS iters may improve / stabilize the EEPos/Vel cost and should be explored
+* Square root implementation of DDP should add numerical stability and should be explored
+* Want to develop URDF > transforms and inertias tool for Arm
+* Would be nice to add a runtime and not compile time switch for Hardware vs. Sim mode and for level of parallelism (M)
Original file line number	Diff line number	Diff line change
`@@ -306,7 +306,7 @@ void computeFSVars(T b_ApBK, T b_Bdu, T s_AB, T s_K, T *s_du, int ld_A){`
`306`	`306`	`// multiply row kx of B by column ky of du store in (kx,ky) of d`
`307`	`307`	`val += s_AB[OFFSET_B + kx + DIM_AB_rj]s_du[j];`
`308`	`308`	`}`
`309`		`- b_Bdu[kx] = -val;`
	`309`	`+ b_Bdu[kx] = val;`
`310`	`310`	`}`
`311`	`311`	`}`
`312`	`312`	`}`
`@@ -327,7 +327,7 @@ void computeExpRed(T s_dJ, T s_H, T s_g, T s_du){`
`327`	`327`	`s_dJ[ind] += val1;`
`328`	`328`	`s_dJ[DIM_du_r + ind] += val2;`
`329`	`329`	`#else`
`330`		`- s_dJ[0] += val1;`
	`330`	`+ s_dJ[0] += val1;`
`331`	`331`	`s_dJ[1] += val2;`
`332`	`332`	`#endif`
`333`	`333`	`}`