Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ This is a large, difficult project. Start early, and get help when you need it.
Sometimes this button takes a little bit to show up when you first open VS Code. If you're not seeing it, make sure you have the Java extension pack installed and it is active.
1. It should ask you for a filename. Give it the following filename:
```
wikipediaData.txt
keatsTraining.txt
```
Then hit enter.
1. It should ask you for a number of words. Enter a positive integer and hit enter.
Expand All @@ -57,7 +57,7 @@ Sometimes this button takes a little bit to show up when you first open VS Code.
1. Open the testing side panel by clicking on the beaker on the left of your screen. ![Test Runner Sidebar in VS Code](images/test_runner.png)
1. Hover over `ramblebot`. A few grey triangles should appear. Click the triangle the furthest to the left.
1. You should expect to see all the tests fail. This is good! You haven't written your solution yet, so it's expected for them to fail.
1. Validate that you can push to your repo by making any change to this README, adding, committing, and pushing it.
1. Validate that you can push to your repo by making any change to this README, adding, committing, and pushing it...

## Understanding the Project

Expand Down
37 changes: 34 additions & 3 deletions src/LowercaseSentenceTokenizer.java
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;

Expand Down Expand Up @@ -30,7 +31,37 @@ public class LowercaseSentenceTokenizer implements Tokenizer {
*/
public List<String> tokenize(Scanner scanner) {
// TODO: Implement this function to convert the scanner's input to a list of words and periods
return null;
}
}
// 1: Make a list for tokens (String)
List<String> tokensList = new ArrayList<String>();

// 2: Grab each word in the text and put into list
while (scanner.hasNextLine())
{
// if word ends with a period, split it between the period and the letter before it -

/*if (scanner.next().endsWith("."))
{
String[] wordArray = scanner.next().split("");
if (wordArray[wordArray.length-1] == ".")
{
tokensList.add(scanner.next().toLowerCase());
}
}
*/
Comment on lines +42 to +50

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to delete unneeded comments once you're done with them

String word = scanner.next();

if (word.endsWith("."))
{
String wordWithoutPeriod = word.replace(".", "");
tokensList.add(wordWithoutPeriod);
tokensList.add(".");
Comment on lines +53 to +57

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice logic!

}
else
{
tokensList.add(word.toLowerCase());
}
}

return tokensList;
}
}
13 changes: 12 additions & 1 deletion src/LowercaseSentenceTokenizerTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,19 @@ void testTokenizeWithNoCapitalizationOrPeriod() {

// Wave 2
/*
* Write your test here!
* Write your test here! everything works for initial commit
*/
@Test
void testCodeHandlesInputWithManyCase()
{
// Arrange
LowercaseSentenceTokenizer tokenizer = new LowercaseSentenceTokenizer();
Scanner scanner = new Scanner("hi hi hi");
List<String> tokens = tokenizer.tokenize(scanner);

// Act & Assert
assertEquals(List.of("hi", "hi", "hi"), tokens);
}
Comment on lines +22 to +32

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't quite test what we were looking for. We wanted to see whether your code could handle multiple spaces in a row, e.g. hi hi hello



// Wave 3
Expand Down
50 changes: 49 additions & 1 deletion src/UnigramWordPredictor.java
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import java.util.List;
import java.util.Map;
import java.util.Scanner;
import java.util.Random;

/**
* A class for predicting the next word in a sequence using a unigram model.
Expand Down Expand Up @@ -52,6 +53,26 @@ public void train(Scanner scanner) {
List<String> trainingWords = tokenizer.tokenize(scanner);

// TODO: Convert the trainingWords into neighborMap here
neighborMap = new HashMap<String, List<String>>();

for (int i=0; i < trainingWords.size()-1; i++)
{

List<String> wordFollowUpList = new ArrayList<String>();

if (!neighborMap.containsKey(trainingWords.get(i)))
{
wordFollowUpList.add(trainingWords.get(i+1));
neighborMap.put(trainingWords.get(i), wordFollowUpList);
}
else
{
List<String> currentWordsList = neighborMap.get(trainingWords.get(i));
currentWordsList.add(trainingWords.get(i+1));
neighborMap.put(trainingWords.get(i), currentWordsList);
}
Comment on lines +63 to +73

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice logic!

}
System.out.println(neighborMap);
}

/**
Expand Down Expand Up @@ -101,7 +122,34 @@ public void train(Scanner scanner) {
public String predictNextWord(List<String> context) {
// TODO: Return a predicted word given the words preceding it
// Hint: only the last word in context should be looked at
return null;

// if number of words to generate is 1, "upon" will be generated
String startingWord = "";
List<String> temp = new ArrayList<String>();
temp.addAll(getNeighborMap().keySet());
startingWord = temp.get(0);
System.out.println("TEMP: " + temp);
System.out.println("STARTING WORD: " + startingWord);
Comment on lines +127 to +132

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where you're headed with this, but I think there might be some misunderstanding about what context represents. Always feel free to come by tutoring or office hours if you're unsure!


context = getNeighborMap().get(startingWord); // build off the fact that index0 is always gonna be first
System.out.println("CURRENT CONTEXT: " + context);

int max = context.size()-1;
int min = 0;
Random r = new Random();
int randomNum = r.nextInt(max-min+1) + min;

String randomWord = "";

randomWord = context.get(randomNum);
context = getNeighborMap().get(randomWord);

System.out.println(" | next word: " + randomWord);
System.out.println(" | next context: " + context);
System.out.println(" | size of context: " + context.size());

return randomWord;
//return randomWord; // "upon" is still generated with 1, even when null.
}

/**
Expand Down
1 change: 1 addition & 0 deletions training.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Hello world. This is Dr.Smith's hello example.
Loading