Book-indexing project

The idea is to build a simple search index of a document based on all the data up to a specific point in the text. The motivation is to be able to query what has already been read for refreshing context about characters and events previously occurring in the book without spoiling things that have not yet happened.

Getting started

Once the repository is cloned, you need to populate the raw data.

Run .\scrape.sh to download some books from Project Gutenberg to start with.
Run julia main.jl -write -overwrite -ingest -clean to populate the database with the cleaned data from the downloaded books.

On future runs, you can remove -overwrite if you have added new books via .\scrape.sh and just want to update the database with their values.

Then, you should be able to query the data with a command such as

julia main.jl -read -title "The Great Gatsby" -query "Daisy Gatsby soup for dinner"

which will query the text from "The Great Gatsby", and return an excerpt that best matches the query, with two lines of context before and after:

It was dark now, and as we dipped under a little bridge I put my arm
around Jordan’s golden shoulder and drew her toward me and asked her
**to dinner. Suddenly I wasn’t thinking of Daisy and Gatsby any more,**
but of this clean, hard, limited person, who dealt in universal
scepticism, and who leaned back jauntily just within the circle of my

Running web interface locally

There are a few additional steps to follow:

Run julia --startup-file=no -e 'using DaemonMode; serve()' in the background.
Run php -S localhost:8000 process.php from the ./src/ directory.
Open a web browser and navigate to http://localhost:8000/main.html.

Here, the specific calls to the julia scripts are wrapped in the php back end.

Development

schema

TODO

De-prioritized/Abandoned

Refactor to use a struct with metadata for words

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book-indexing project

Getting started

Running web interface locally

Development

TODO

De-prioritized/Abandoned

About

Uh oh!

Releases

Packages

Languages

BKaperick/book-indexing

Folders and files

Latest commit

History

Repository files navigation

Book-indexing project

Getting started

Running web interface locally

Development

TODO

De-prioritized/Abandoned

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages