You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ Predicting stock prices using a TensorFlow LSTM (long short-term memory) neural
4
4
5
5
## 1) Introduction
6
6
7
-
Predicting stock prices is a cumbersome task as it does not follow any specific pattern. Changes in the stock prices are purely based on the supply and demand during a period of time. In order to learn the specific characteristics of a stock price, we can use deep learning to identify these patterns through machine learning. One of the most well-known networks for series forecasting is [LSTM](https://www.researchgate.net/publication/13853244_Long_Short-term_Memory) (long short-term memory) which is a Recurrent Neural Network (RNN) that is able to remember information over a long period of time, thus making them extremely useful for predicting stock prices. RNNs are well-suited to time series data and they are able to process the data step-by-step, maintaining an internal state where they cache the information they have seen so far in a summarised version. The successful prediction of a stock's future price could yield significant profit.
7
+
Predicting stock prices is a cumbersome task as it does not follow any specific pattern. Changes in the stock prices are purely based on supply and demand during a period of time. In order to learn the specific characteristics of a stock price, we can use deep learning to identify these patterns through machine learning. One of the most well-known networks for series forecasting is [LSTM](https://www.researchgate.net/publication/13853244_Long_Short-term_Memory) (long short-term memory) which is a Recurrent Neural Network (RNN) that is able to remember information over a long period of time, thus making them extremely useful for predicting stock prices. RNNs are well-suited to time series data and they are able to process the data step-by-step, maintaining an internal state where they cache the information they have seen so far in a summarised version. The successful prediction of a stock's future price could yield a significant profit.
The data has a json document which we could use later on to create our Security Master if we ever wanted to store this data somewhere to keep track of the Securities we are going to trade with. As the data could come with different fields, my suggestion is to store them on a Data Lake so we can build it from multiple sources withouth having to worry to much on the way the data is structured.
316
+
The data has a JSON document that we could use later on to create our Security Master if we ever wanted to store this data somewhere to keep track of the Securities we are going to trade with. As the data could come with different fields, my suggestion is to store them on a Data Lake so we can build it from multiple sources without having to worry too much about the way the data is structured.
317
317
318
318
### 2.2) Market Data Download
319
319
320
-
The previous step helps us to identify several characteristiques of a given ticker symbol so we can use its properties to define some of the charts I'm showing below. Note that the yFinance library only requires the stock to download via ticker symbol, the start date and end date of the period we want to get. Additionally, we can also specify the granularity of the data using the interval parameter. By default the interval is 1 day and this is the one I will use for my training.
320
+
The previous step helps us to identify several characteristics of a given ticker symbol so we can use its properties to define some of the charts I'm showing below. Note that the yFinance library only requires the stock to download via ticker symbol, the start date and end date of the period we want to get. Additionally, we can also specify the granularity of the data using the interval parameter. By default, the interval is 1 day and this is the one I will use for my training.
321
321
322
322
To download the data we can use the following command:
323
323
@@ -408,23 +408,23 @@ Note that is important to mention the start date correctly just to ensure we are
408
408
409
409
Now that we have the data that we want to use, we need to define what defines our traning and validation data. As stocks could vary depending on the dates, the function I have created requires 3 basic arguments:
410
410
- Ticker Symbol: **GOOG**
411
-
- Start Date: Date as to when they started, in this case it was **2004-Aug-01**.
412
-
- Validation Date: Date as to when we want the validation to be considered. In this case we specify **2017-01-01** as our data point.
411
+
- Start Date: Date as to when they started, in this case, it was **2004-Aug-01**.
412
+
- Validation Date: Date as to when we want the validation to be considered. In this case, we specify **2017-01-01** as our data point.
413
413
414
-
Note that you will need to have configured [TensorFlow](https://www.tensorflow.org/), Keras and a GPU in order to run the samples below.
414
+
Note that you will need to have configured [TensorFlow](https://www.tensorflow.org/), Keras ,and a GPU in order to run the samples below.
415
415
416
416
In this exercise, I'm only interested in the [closing price](https://www.investopedia.com/terms/c/closingprice.asp) which is the standard benchmark regarding stocks or securities.
417
417
418
418
Below you can find the chart with the division we will create between Training Data and Validation Data:
In order to normalise the data, we need to scale it between 0 and 1 so we talk in a common scale. To accomplish this, we can use the preprocessing tool [MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html) as seen below:
427
+
In order to normalise the data, we need to scale it between 0 and 1 so we talk on a common scale. To accomplish this, we can use the preprocessing tool [MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html) as seen below:
428
428
429
429
```python
430
430
min_max = MinMaxScaler(feature_range=(0, 1))
@@ -433,7 +433,7 @@ In order to normalise the data, we need to scale it between 0 and 1 so we talk i
433
433
434
434
### 3.3) Adding Timesteps
435
435
436
-
LSTM network needs the data imported as a 3D array. To translate this 2D array into a 3D one, we use a short [timestep](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) to loop through the data and create smaller partitions and feed them into the model. The final array is then reshaped into training samples, x number of timesteps and 1 feature per step. The code below represents this concept:
436
+
LSTM network needs the data imported as a 3D array. To translate this 2D array into a 3D one, we use a short [timestep](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) to loop through the data and create smaller partitions and feed them into the model. The final array is then reshaped into training samples, x number of timesteps, and 1 feature per step. The code below represents this concept:
437
437
438
438
```python
439
439
time_steps =3
@@ -526,7 +526,7 @@ Once we have defined the model, we need to specify the metrics we want to use to
526
526
callbacks=[callback])
527
527
```
528
528
529
-
This model is slightly fined tuned to reach the lowest validation loss. In this example, we reach a validation loss of 0.20% with an MSE (Mean Square Error) of 0.14% which is relatively good, providing us with a very accurate result.
529
+
This model is slightly fined tuned to reach the lowest validation loss. In this example, we reach a validation loss of 0.14% with an MSE (Mean Square Error) of 0.14% which is relatively good, providing us with a very accurate result.
530
530
531
531
The training result can be seen below:
532
532
@@ -565,7 +565,7 @@ MSE : 0.0014113366
565
565
566
566
Now it is time to prepare our testing data and send it through our deep-learning model to obtain the predictions we are trying to get.
567
567
568
-
First we need to import the test data using the same approach we used for the training data using the time steps:
568
+
First, we need to import the test data using the same approach we used for the training data using the time steps:
569
569
570
570
```python
571
571
# Testing Data Transformation
@@ -590,7 +590,7 @@ With the validation loss and validation MSE metrics:
590
590
591
591
# 4) Usage
592
592
593
-
This has been built using python 3.6.8 version.
593
+
This has been built using Python 3.6.8 version.
594
594
595
595
Download the source code and install the following packages:
0 commit comments