realpython
diff --git a/‎.github/ISSUE_TEMPLATE/bug_report.md‎
Lines changed: 42 additions & 0 deletions b/‎.github/ISSUE_TEMPLATE/bug_report.md‎
Lines changed: 42 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 11 additions & 3 deletions b/‎README.md‎
Lines changed: 11 additions & 3 deletions
diff --git a/‎nlp-sentiment-analysis/README.md‎
Lines changed: 38 additions & 0 deletions b/‎nlp-sentiment-analysis/README.md‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎nlp-sentiment-analysis/requirements.txt‎
Lines changed: 2 additions & 0 deletions b/‎nlp-sentiment-analysis/requirements.txt‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎nlp-sentiment-analysis/sentiment_analyzer.py‎
Lines changed: 159 additions & 0 deletions b/‎nlp-sentiment-analysis/sentiment_analyzer.py‎
Lines changed: 159 additions & 0 deletions
diff --git a/‎numpy-tutorial/README.md‎
Lines changed: 21 additions & 0 deletions b/‎numpy-tutorial/README.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎numpy-tutorial/bad-gray.jpg‎
274 KB b/‎numpy-tutorial/bad-gray.jpg‎
274 KB
diff --git a/‎numpy-tutorial/blue.jpg‎
123 KB b/‎numpy-tutorial/blue.jpg‎
123 KB
@@ -0,0 +1,42 @@
+---
+name: Bug report
+about: Create a report to help us improve
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+:information_source: Please note that the best way to get support for Real Python courses & articles is to join one of our [weekly Office Hours calls](https://realpython.com/office-hours/) or in the [RP Community Slack](https://realpython.com/community/). 
+
+You can report issues and problems here, but we typically won't be able to provide 1:1 support outside the channels listed above.
+
+**Describe the bug**
+A clear and concise description of what the bug is.
+
+**To Reproduce**
+Steps to reproduce the behavior:
+1. Go to '...'
+2. Click on '....'
+3. Scroll down to '....'
+4. See error
+
+**Expected behavior**
+A clear and concise description of what you expected to happen.
+
+**Screenshots**
+If applicable, add screenshots to help explain your problem.
+
+**Desktop (please complete the following information):**
+ - OS: [e.g. iOS]
+ - Browser [e.g. chrome, safari]
+ - Version [e.g. 22]
+
+**Smartphone (please complete the following information):**
+ - Device: [e.g. iPhone6]
+ - OS: [e.g. iOS8.1]
+ - Browser [e.g. stock browser, safari]
+ - Version [e.g. 22]
+
+**Additional context**
+Add any other context about the problem here.
@@ -1,10 +1,18 @@
 # Real Python Materials
 
-Bonus materials, exercises, and example projects for our [Python tutorials](https://realpython.com).
+Bonus materials, exercises, and example projects for Real Python's [Python tutorials](https://realpython.com).
 
 Build Status: [![CircleCI](https://circleci.com/gh/realpython/materials.svg?style=svg)](https://circleci.com/gh/realpython/materials)
 
-## Running Code Style Checks
+## Got a Question?
+
+The best way to get support for Real Python courses & articles and code in this repository is to join one of our [weekly Office Hours calls](https://realpython.com/office-hours/) or to ask your question in the [RP Community Slack](https://realpython.com/community/). 
+
+Due to time constraints we cannot provide 1:1 support via GitHub. See you on Slack or on the next Office Hours call 🙂
+
+## Adding Source Code & Sample Projects to This Repo (RP Contributors)
+
+### Running Code Style Checks
 
 We use [flake8](http://flake8.pycqa.org/en/latest/) and [black](https://github.com/ambv/black) to ensure a consistent code style for all of our sample code in this repository.
 
@@ -15,7 +23,7 @@ $ flake8
 $ black --check .
 ```
 
-## Running Python Code Formatter
+### Running Python Code Formatter
 
 We're using a tool called [black](https://github.com/ambv/black) on this repo to ensure consistent formatting. On CI it runs in "check" mode to ensure any new files added to the repo are following PEP 8. If you see linter warnings that say something like "would reformat some_file.py" it means black disagrees with your formatting. 
 
 
@@ -0,0 +1,38 @@
+# Use Sentiment Analysis With Python to Classify Reviews
+
+Resources and materials for Real Python's [Use Sentiment Analysis With Python to Classify Reviews](https://realpython.com/use-sentiment-analysis-python-classify-movie-reviews/) tutorial.
+
+## Installation
+
+Create and activate a new virtual environment:
+
+```shell
+$ python -m venv .venv
+$ source .venv/bin/activate
+```
+
+Install Python dependencies into the active virtual environment:
+
+```shell
+(.venv) $ python -m pip install -r requirements.txt
+```
+
+Download English model for spaCy:
+
+```shell
+(.venv) $ python -m spacy download en_core_web_sm
+```
+
+Download and extract the [Large Movie Review Dataset](https://ai.stanford.edu/~amaas/data/sentiment/) compiled by [Andrew Maas](http://www.andrew-maas.net/):
+
+```shell
+$ curl -s https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz | tar xvz
+```
+
+## Usage
+
+Get the sentiment of a movie review stored in the `TEST_REVIEW` variable:
+
+```shell
+(.venv) $ python sentiment_analyzer.py
+```
@@ -0,0 +1,2 @@
+pandas==1.1.2
+spacy==2.3.2
@@ -0,0 +1,159 @@
+import os
+import random
+import spacy
+from spacy.util import minibatch, compounding
+import pandas as pd
+
+
+TEST_REVIEW = """
+Transcendently beautiful in moments outside the office, it seems almost
+sitcom-like in those scenes. When Toni Colette walks out and ponders
+life silently, it's gorgeous.<br /><br />The movie doesn't seem to decide
+whether it's slapstick, farce, magical realism, or drama, but the best of it
+doesn't matter. (The worst is sort of tedious - like Office Space with less
+humor.)
+"""
+
+
+eval_list = []
+
+
+def train_model(
+    training_data: list, test_data: list, iterations: int = 20
+) -> None:
+    # Build pipeline
+    nlp = spacy.load("en_core_web_sm")
+    if "textcat" not in nlp.pipe_names:
+        textcat = nlp.create_pipe(
+            "textcat", config={"architecture": "simple_cnn"}
+        )
+        nlp.add_pipe(textcat, last=True)
+    else:
+        textcat = nlp.get_pipe("textcat")
+
+    textcat.add_label("pos")
+    textcat.add_label("neg")
+
+    # Train only textcat
+    training_excluded_pipes = [
+        pipe for pipe in nlp.pipe_names if pipe != "textcat"
+    ]
+    with nlp.disable_pipes(training_excluded_pipes):
+        optimizer = nlp.begin_training()
+        # Training loop
+        print("Beginning training")
+        print("Loss\tPrecision\tRecall\tF-score")
+        batch_sizes = compounding(
+            4.0, 32.0, 1.001
+        )  # A generator that yields infinite series of input numbers
+        for i in range(iterations):
+            print(f"Training iteration {i}")
+            loss = {}
+            random.shuffle(training_data)
+            batches = minibatch(training_data, size=batch_sizes)
+            for batch in batches:
+                text, labels = zip(*batch)
+                nlp.update(text, labels, drop=0.2, sgd=optimizer, losses=loss)
+            with textcat.model.use_params(optimizer.averages):
+                evaluation_results = evaluate_model(
+                    tokenizer=nlp.tokenizer,
+                    textcat=textcat,
+                    test_data=test_data,
+                )
+                print(
+                    f"{loss['textcat']}\t{evaluation_results['precision']}"
+                    f"\t{evaluation_results['recall']}"
+                    f"\t{evaluation_results['f-score']}"
+                )
+
+    # Save model
+    with nlp.use_params(optimizer.averages):
+        nlp.to_disk("model_artifacts")
+
+
+def evaluate_model(tokenizer, textcat, test_data: list) -> dict:
+    reviews, labels = zip(*test_data)
+    reviews = (tokenizer(review) for review in reviews)
+    true_positives = 0
+    false_positives = 1e-8  # Can't be 0 because of presence in denominator
+    true_negatives = 0
+    false_negatives = 1e-8
+    for i, review in enumerate(textcat.pipe(reviews)):
+        true_label = labels[i]["cats"]
+        for predicted_label, score in review.cats.items():
+            # Every cats dictionary includes both labels, you can get all
+            # the info you need with just the pos label
+            if predicted_label == "neg":
+                continue
+            if score >= 0.5 and true_label["pos"]:
+                true_positives += 1
+            elif score >= 0.5 and true_label["neg"]:
+                false_positives += 1
+            elif score < 0.5 and true_label["neg"]:
+                true_negatives += 1
+            elif score < 0.5 and true_label["pos"]:
+                false_negatives += 1
+    precision = true_positives / (true_positives + false_positives)
+    recall = true_positives / (true_positives + false_negatives)
+
+    if precision + recall == 0:
+        f_score = 0
+    else:
+        f_score = 2 * (precision * recall) / (precision + recall)
+    return {"precision": precision, "recall": recall, "f-score": f_score}
+
+
+def test_model(input_data: str = TEST_REVIEW):
+    #  Load saved trained model
+    loaded_model = spacy.load("model_artifacts")
+    # Generate prediction
+    parsed_text = loaded_model(input_data)
+    # Determine prediction to return
+    if parsed_text.cats["pos"] > parsed_text.cats["neg"]:
+        prediction = "Positive"
+        score = parsed_text.cats["pos"]
+    else:
+        prediction = "Negative"
+        score = parsed_text.cats["neg"]
+    print(
+        f"Review text: {input_data}\nPredicted sentiment: {prediction}"
+        f"\tScore: {score}"
+    )
+
+
+def load_training_data(
+    data_directory: str = "aclImdb/train", split: float = 0.8, limit: int = 0
+) -> tuple:
+    # Load from files
+    reviews = []
+    for label in ["pos", "neg"]:
+        labeled_directory = f"{data_directory}/{label}"
+        for review in os.listdir(labeled_directory):
+            if review.endswith(".txt"):
+                with open(f"{labeled_directory}/{review}") as f:
+                    text = f.read()
+                    text = text.replace("<br />", "\n\n")
+                    if text.strip():
+                        spacy_label = {
+                            "cats": {
+                                "pos": "pos" == label,
+                                "neg": "neg" == label,
+                            }
+                        }
+                        reviews.append((text, spacy_label))
+    random.shuffle(reviews)
+
+    if limit:
+        reviews = reviews[:limit]
+    split = int(len(reviews) * split)
+    return reviews[:split], reviews[split:]
+
+
+if __name__ == "__main__":
+    train, test = load_training_data(limit=25)
+    print("Training model")
+    train_model(train, test)
+    df = pd.DataFrame(eval_list)
+    pd.DataFrame.plot(df)
+    print("Testing model")
+    test_model()
@@ -0,0 +1,21 @@
+# Numpy Tutorial: First Steps in Data Science
+
+This folder contains the sample code for the [NumPy Tutorial](https://realpython.com/numpy-tutorial/) by @rpalo.
+
+## Installing
+
+First, you will need to have the dependencies installed:
+
+```shell
+$ python -m pip install -r requirements.txt
+```
+
+## Usage
+
+These examples all make use of the [Jupyter Notebook](https://jupyter-notebook.readthedocs.io/en/stable/). To run the examples, make sure the correct virtual environment is active (you may have to deactivate and reactivate after installing requirements), and run:
+
+```shell
+$ jupyter notebook
+```
+
+You should see listings for each of the example files.  You can open them up and run them cell by cell to see how they perform.  Feel free to make small changes to the code to see how those affect things.