Training data for successful tuning run

Hi,

I'm testing out ottertune with MySQL & TPCC.  I have seen some conflicting issues/documentation on training data, so just wanted to get some clarity on a few questions I had.

I have forked ottertune and am using it with Azure MySQL (made the proper changes to parser/code/configs to enable that).  I used LHS to generate ~50 samples for 6 different knobs, then ran a few loops (resulting in errors, or 'not enough training data found' issues).  Overall, I have about 100 points in my no_tuning_session, but when I run loops in a new tuning session, it always either (1) generates no recommendation / blank recommendation or (2) says not enough training data.

1: How do we evaluate whether our training data is good?  We are currently using LHS to generate configurations. Are the training points expected to follow a normal distribution?  
2: How many training data points are needed, on average?  I have seen the tuning pipeline run with 44 training points, but I had uploaded ~100.  I'm assuming this is because there were duplicate knobs?  If duplicate knobs are filtered out, what's the best way to ensure we have the right amount of data before starting a tuning session?
3: Is there a way to make the upload process faster for LHS samples?  To upload ~100 points it would take a day or two if the observation period is 5 minutes (1 upload currently takes ~8-10min).

In general, the tuning pipeline has been failing for me and it's hard to catch the issues until you are running a tuning session itself.  

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training data for successful tuning run #256

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training data for successful tuning run #256

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions