-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathCapstonePresentation.Rpres
More file actions
42 lines (28 loc) · 2.41 KB
/
CapstonePresentation.Rpres
File metadata and controls
42 lines (28 loc) · 2.41 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Word Prediction: Coursera Data Science Capstone Project
========================================================
author: Harrison Hassig
date: March 26, 2017
autosize: true
Introduction
========================================================
The following presentation is designed to showcase my skills learned in the Data Science Specialization from Coursera and John Hopkins University. In this project, JHU partnered with Swiftkey (http://swiftkey.com) to apply data science for use of natural language processing and ultimately the prediction of the next word in a small data set.
The objective of this project was to build a working predictive text model. The data used in the model came from a **corpus** called HC Corpora (www.corpora.heliohost.org).
Algorithm Development
========================================================
The algorithm developed to predict the next word in a user-entered text string was based on a
classic **N-gram** model. Using a subset of cleaned data from the provided data set, unigrams, bigrams, and trigrams were computed.
Ultimately, this allowed me to predict the next word of a user-inputted string of words, presented in my shiny application, capstone project, and what I am pitching to you today.
More information can be found here:
https://rpubs.com/HHassig/251855
The Shiny Application
========================================================
Using the algorithm and plan described in the previous link, a Shiny application (shiny.rstudio.com) was created and hosted here:
https://hhassig.shinyapps.io/Capstone/
This application is our propreitary algorithm and accepts a word or phrase as input and suggests a word from the dataset and training done (via linear interpolation of trigrams, bigrams, unigrams) that is *believed* to be most likely to be the next word the user wishes to utilize.
The source files for this project can be found:
https://github.com/HHassig/Coursera-Data-Science-Capstone
Using the Application
========================================================
Simply type in a phrase and hit "Predict!" and the application will provide the top word based on our algorithm, as well as its likelihood as a percentage, and the alternative choices and their percentages.
We believe this application to be easy to use, and its usage beyond this web app is unlimited and scalable. Please enjoy our live demonstration and we are looking forward to your questions.
-HarrisonHassig@fakeemail.com