Skip to content

Conversation

@jlealtru
Copy link

@jlealtru jlealtru commented Jul 5, 2019

Adding tutorials on pitchfork data and some old code.

Copy link
Owner

@datawrestler datawrestler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, good start - add intro sections to both scripts, take advantage of headers to break things up, change the training process to iteratively unfreeze weights, possibly check out fastprogress, use relative paths, and never put keys/secrets in source code again.

],
"source": [
"print(os.getcwd())\n",
"path='/media/jlealtru/data_files/github/Tutorials/TextAnalytics/pitchfork_data'"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use relative paths - add either a standalone script that secures data from source or run it in an intro section, but show how to download the source data directly so all your steps can be rebuilt.

"learn_classifier.freeze_to(-2)\n",
"lr /= 2\n",
"learn_classifier.fit_one_cycle(1, slice(lr/(2.6**4),lr), moms=(0.8,0.7))\n",
"#learn_classifier.fit_one_cycle(2, slice(1e-4/2,1e-2/2), moms=(0.8,0.7))"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at fastprogress - I think folks would find it really interesting to be abel to iteratively build a training graph as you progress.

"metadata": {},
"outputs": [],
"source": [
"learn_classifier.unfreeze()\n",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fastai folks recommend iteratively unfreezing layers sequentially. Start with -1, then -2, then -3, then unfreeze all. That will likely help out.

"cell_type": "markdown",
"metadata": {},
"source": [
"In this tutorial we are going to implement a transfer learning model for text version of the ULMfit. \n",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format this markdown and add TOC with hyperlinks and additional sources to review.

"# username \n",
"os.environ['KAGGLE_USERNAME'] = \"jlealtru\" \n",
"# key\n",
"os.environ['KAGGLE_KEY'] = \"6c3a4d6b4d8e7804780d6cb02879ac53\""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlealtru never post secrets/keys in source code. You have a couple options. The easiest, although not safest, is creating a separate file and import that file in and reference the variable name only in the code. Alternatively, you can leverage something like Azure Key Vault (easy to use, super powerful - think of OnePassword or LastPass except at scale/programatically)

],
"source": [
"#learn.fit_one_cycle(10, 2e-3, moms=(0.8,0.7), wd=0.1)\n",
"learn_pitchfork.fit_one_cycle(12, 2e-3/3, moms=(0.8,0.7), wd= 0.1)"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again - iteratively unfreeze layers and train - track progress using something like fastprogress

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants