Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 39 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# ramblebot
# ramblebot v-Alex Bolshakov

A project to exercise Java, JUnit, git, GitHub, and code-reading skills. Students will create a language model to generate text.

Expand All @@ -7,12 +7,14 @@ A project to exercise Java, JUnit, git, GitHub, and code-reading skills. Student
### Academic Honesty

THIS IS AN INDIVIDUAL PROJECT. The following is not allowed:

- You MAY NOT copy any code from an AI.
- You MAY NOT paste any of the project or your code into an AI.
- You MAY NOT copy another student's code.
- You MAY NOT copy substantial portions of your solution from the internet.

You may:

- You are allowed to talk about the project generally with other students.
- You are allowed to get help from tutors, so long as you write all the code and they do not walk you through step by step.
- You are allowed to get help in office hours.
Expand All @@ -23,37 +25,38 @@ You may:
YOU ARE EXPECTED TO MAKE SMALL, FREQUENT COMMITS. Doing so is good practice and helps me see that it's less likely you pasted in a large part of your solution from elsewhere.

### Timeline

This is a large, difficult project. Start early, and get help when you need it.

## Setup

1. Fork and clone this project. MAKE SURE TO CLONE FROM YOUR FORK. The clone URL should have your username in it.
1. Change into the project directory:
```
cd ramblebot
```
```
cd ramblebot
```
1. Open the project in VS Code.
```
code .
```
If the above command does not work, you can open VS Code manually and select the ramblebot folder to open.
```
code .
```
If the above command does not work, you can open VS Code manually and select the ramblebot folder to open.
1. Open `RambleApp.java`. Click anywhere in the text of the file
1. Scroll to the bottom to find the `main` method. There should be a small grey "run" button above it. Click "Run".
![Run Button in VS Code](images/run_button.png)
Sometimes this button takes a little bit to show up when you first open VS Code. If you're not seeing it, make sure you have the Java extension pack installed and it is active.
![Run Button in VS Code](images/run_button.png)
Sometimes this button takes a little bit to show up when you first open VS Code. If you're not seeing it, make sure you have the Java extension pack installed and it is active.
1. It should ask you for a filename. Give it the following filename:
```
wikipediaData.txt
```
Then hit enter.
```
wikipediaData.txt
```
Then hit enter.
1. It should ask you for a number of words. Enter a positive integer and hit enter.
1. You should expect to see an error message. This is good! The error message should end like this:
```
No tokens returned from tokenizer!
This is probably because you haven't implemented it yet
Begin with Wave 1 in the instructions, and implement LowercaseSentenceTokenizer
If you have implemented it, there's a bug in your code where it's returning null for the tokens.
```
```
No tokens returned from tokenizer!
This is probably because you haven't implemented it yet
Begin with Wave 1 in the instructions, and implement LowercaseSentenceTokenizer
If you have implemented it, there's a bug in your code where it's returning null for the tokens.
```
1. Open the testing side panel by clicking on the beaker on the left of your screen. ![Test Runner Sidebar in VS Code](images/test_runner.png)
1. Hover over `ramblebot`. A few grey triangles should appear. Click the triangle the furthest to the left.
1. You should expect to see all the tests fail. This is good! You haven't written your solution yet, so it's expected for them to fail.
Expand All @@ -64,42 +67,55 @@ Sometimes this button takes a little bit to show up when you first open VS Code.
The goal of this project is to make a bot that can generate new text in the style of some writer. It will do this by reading some input writing, and then word-by-word generating new text that mimics it.

## Wave 1

In wave 1, you will start implementing `tokenize` in `LowercaseSentenceTokenizer.java`. The goal is to take a scanner, read through it, and return a list of words that were separated by spaces/newlines. For example, if the scanner had the following text:

```
this is a lowercase sentence without a period
```

Then tokenize should return a list that looks like this:

```
["this", "is", "a", "lowercase", "sentence", "without", "a", "period"]
```

I recommend not yet worrying about periods or capitalization. You will improve your code in later waves to handle this. `testTokenizeWithNoCapitalizationOrPeriod` in `LowercaseSentenceTokenizerTest` will exercise this functionality. The other tests will likely still fail. This is OK! You'll tackle them in later waves. Add commit and push your code if you have not already!

## Wave 2
In wave 2, you will add your own test. You should test that your code properly handles input with many spaces. For example, something like:

In wave 2, you will add your own test. You should test that your code properly handles input with many spaces. For example, something like:

```
hello hi hi hi hello hello
```

Write your test in `LowercaseSentenceTokenizerTest` where indicated by the comment, and verify it passes. Fix any bugs in your code if you find them. Add commit and push your code if you have not already!

## Wave 3

In wave 3, you will finish the implementation of `tokenize`. Read the Javadoc carefully to understand what to do. Successfully completing this wave should make the remaining tests in `LowercaseSentenceTokenizerTest` pass. Add, commit, and push your code!

## Wave 4

In wave 4 you will finish the implementation of `train` in `UnigramWordPredictor`. There is already one line of the implementation provided for you, you do not need to change it. Write the rest of te implementation below it. Read the Javadoc on `train` carefully to understand what is expected. I also recommend reading `testTrainAndGetNeighborMap` in `UnigramWordPredictorTest` as it gives an example input/output. Successfully completing your code should make that test pass. This method is probably the hardest part of the project. Look back at compound data structures and ask for help from tutors or in office hours if needed. Make sure to add, commit, and push your code frequently!

## Wave 5

In wave 5 you will implement `predictNextWord` in `UnigramWordPredictor`. As part of implementing this, you will need to research how to generate random numbers in Java. Read the Javadoc carefully, and read the comments on the remaining tests. Once you complete this wave, all tests in the project should pass.

## Wave 6

In wave 6 you will validate that your bot works by having it generate new text. Choose some source of text (poems, songs, essays, etc) and put it in a new file in the root of the repository. Name it something descriptive like `oscarWildeTraining.txt`.

Then, run the main method of `RambleApp.java` again (see instructions partway through the Getting Started section). Have it use your new text file. Have it generate at least 100 words. Save the output into a new file `ramblebotOutput.txt`. Experiment with different training data sources and see if you can have it make something funny/interesting/profound!

Once you have it working and passing tests, congrats! You are finished! Make sure to add, commit, push, open a PR, and submit the link on Canvas. You can choose to continue on to the bonus extensions even if you have already made a PR. Your PR will automatically be updated with new commits you push.

## Bonus Extensions

Consider doing any of the following (some are very hard!):

- Adding more tests to the classes you implemented
- Testing `RambleApp`
- Creating a Bigram predictor
Expand All @@ -108,6 +124,7 @@ Consider doing any of the following (some are very hard!):
- Anything else you find interesting!

## Submitting

Submit your project by making a PR and copying the link to the canvas assignment.

TURN SOMETHING IN BY THE DUE DATE EVEN IF YOU'RE NOT FINISHED.
TURN SOMETHING IN BY THE DUE DATE EVEN IF YOU'RE NOT FINISHED.
57 changes: 57 additions & 0 deletions jesseWoodsTraining.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
Slow down
Give yourself the run around
And just hold your horses
Outlaw external forces

And float above the boys in the gray suits
I know you like the rush
But the comedowns yet to come

Trudge through the electro madness
Buckle bunny Im behind you
Those machines for killing fascists
They dont make them like they used to
And when the comedown catches me
Before I catch myself

Ride them in my baby
Dont you try to save me

And float above the boys in the gray suits
I know you like the rush
But the comedowns yet to come

Trudge through the electro madness
Buckle bunny Im behind you
Those machines for killing sadness
They dont make them like they used to
And when the comedown catches me
Before I catch myself

Ride them in my baby
Dont you try to save me

Youll never crack my crescendo
Vanta blackout with my soul sister
Through the false love out the window
Tie me down forever and a day I dont care

Youll never crack my crescendo
Vanta blackout with my soul sister
Through the false love out the window
Tie me down forever and a day I dont care

Honeymoon radiation
Im gonna burn it down for good
Go for broke, as country folk
Never thinking about if we should

Honeymoon radiation
Im gonna burn it down for good
Go for broke, as country folk
Never thinking about if we should

Youll never crack my crescendo
Vanta blackout with my soul sister
Through the false love out the window
Tie me down forever and a day I dont care
9 changes: 9 additions & 0 deletions result.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
slow down forever and float above the comedown catches me down for good go for broke,
as country folk never crack my crescendo vanta blackout with my crescendo vanta blackout
with my baby dont care youll never crack my soul sister through the electro madness buckle bunny
Comment on lines +1 to +3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really makes you think

im behind you like they dont you like the window tie me before i catch myself ride them
like the false love out the false love out the boys in the false love out the rush but the
gray suits i catch myself ride them like the false love out the window tie me and when the
rush but the window tie me and float above the boys in the false love out the rush but the
comedowns yet to and when the comedown catches me down forever and float above the false love
out the gray suits i dont care honeymoon radiation
22 changes: 21 additions & 1 deletion src/LowercaseSentenceTokenizer.java
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;

Expand Down Expand Up @@ -30,7 +31,26 @@ public class LowercaseSentenceTokenizer implements Tokenizer {
*/
public List<String> tokenize(Scanner scanner) {
// TODO: Implement this function to convert the scanner's input to a list of words and periods
return null;
List<String> contentList = new ArrayList<>();
List<String> result = new ArrayList<>();

while (scanner.hasNextLine()) {
result.add(scanner.next().toLowerCase());
}

for (String str : result) {
if(str.charAt(str.length()-1) != '.'){
contentList.add(str);
}else{
String temp = str.replace('.', ' ');
contentList.add(temp.replace(" ", ""));
contentList.add(".");
}
}
Comment on lines +34 to +49

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great logic!




return contentList;
}
}

10 changes: 7 additions & 3 deletions src/LowercaseSentenceTokenizerTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,13 @@ void testTokenizeWithNoCapitalizationOrPeriod() {
}

// Wave 2
/*
* Write your test here!
*/
@Test
void testTokenWithManySpaces(){
LowercaseSentenceTokenizer tokenizer = new LowercaseSentenceTokenizer();
Scanner scanner = new Scanner("hello hi hi hi hello hello");
List<String> token = tokenizer.tokenize(scanner);
assertEquals(List.of("hello", "hi", "hi", "hi", "hello", "hello"), token);
}
Comment on lines +19 to +25

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test



// Wave 3
Expand Down
18 changes: 16 additions & 2 deletions src/UnigramWordPredictor.java
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.Scanner;

/**
Expand Down Expand Up @@ -49,9 +50,18 @@ public UnigramWordPredictor(Tokenizer tokenizer) {
* @param scanner the Scanner to read the training text from
*/
public void train(Scanner scanner) {
// TODO: Convert the trainingWords into neighborMap here
neighborMap = new HashMap<>();
List<String> trainingWords = tokenizer.tokenize(scanner);
neighborMap.put(trainingWords.get(0),new ArrayList<>());

// TODO: Convert the trainingWords into neighborMap here
for(int i = 1; i < trainingWords.size(); i++){
neighborMap.get(trainingWords.get(i-1)).add(trainingWords.get(i));

if(!neighborMap.containsKey(trainingWords.get(i))){
neighborMap.put(trainingWords.get(i), new ArrayList<String>());
}
}
Comment on lines +58 to +64

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart initializing the list in the previous pass

}

/**
Expand Down Expand Up @@ -101,7 +111,11 @@ public void train(Scanner scanner) {
public String predictNextWord(List<String> context) {
// TODO: Return a predicted word given the words preceding it
// Hint: only the last word in context should be looked at
return null;
//Random class information gathered at https://www.geeksforgeeks.org/generating-random-numbers-in-java/
Random rand = new Random();
String lastWord = context.get(context.size()-1);
int randomStringIndex = rand.nextInt(neighborMap.get(lastWord).size());
return neighborMap.get(lastWord).get(randomStringIndex);
Comment on lines +116 to +118

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice variable names

}

/**
Expand Down