You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Portfolios/Code/12_SignalExhibits.R
+16-4Lines changed: 16 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,15 @@
1
+
# """
2
+
# Inputs: relies on documentation tables and signal CSVs generated by the upstream Python pipeline (requires `alldocumentation`, `pathProject`, `pathPredictors`, `pathDataIntermediate`, `pathResults` in scope).
3
+
# Outputs: writes `coverage.xlsx` and intermediate fst files; also produces correlation exhibits and related plots.
4
+
# How to run:
5
+
# Rscript 12_SignalExhibits.R
6
+
# Example:
7
+
# Rscript 12_SignalExhibits.R
8
+
# """
9
+
10
+
# Ensure CRAN mirror is defined for non-interactive runs
Copy file name to clipboardExpand all lines: README.md
+49-29Lines changed: 49 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,9 +38,36 @@ The code is separated into three folders:
38
38
39
39
We separate the code so you can choose which parts you want to run. If you only want to create signals, you can run the files in `Signals/pyCode/` and then do your thing. If you just want to create portfolios, you can skip the signal generation by directly downloading its output via the [data page](https://www.openassetpricing.com/). The whole thing is about 15,000 lines, so you might want to pick your battles.
40
40
41
-
More details are below.
41
+
More details are below
42
42
43
-
### 1. Signals/pyCode/
43
+
### `Signals/pyCode` Instructions
44
+
45
+
**1. Set up for Creating Signals (Python and R)**
46
+
47
+
* Install Python dependencies:
48
+
```bash
49
+
cd Signals/pyCode/
50
+
pip install -r requirements.txt
51
+
```
52
+
* Install required R packages. [tbc]
53
+
* Copy `Signals/pyCode/dotenv.template` to `Signals/pyCode/.env` and add your WRDS and FRED credentials.
54
+
- For FRED credentials, request an [API key from FRED](https://research.stlouisfed.org/docs/api/api_key.html)
55
+
56
+
**2. (Optional) Generate Prep Data**
57
+
58
+
This is only necessary for a handful of signals
59
+
60
+
If you have bash:
61
+
* from `Signals/pyCode/`
62
+
- run `bash prep1_run_on_wrds.sh` to copy the prep scripts to the WRDS Cloud
63
+
- wait about 5 hours
64
+
- use qstat to check if it's still running
65
+
- if impatient, check most recent file in `~/temp_prep/log/` on WRDS server.
66
+
- run `bash prep2_dl_from_wrds.sh` to download the prep data from the WRDS Cloud to `Signals/pyData/Prep/`
67
+
68
+
You can alternatively upload to the WRDS Cloud manually, ssh into WRDS, run `qsub run_all_prep.sh`, and then manually download the prep data.
69
+
70
+
**3. Run the Signals Code**
44
71
45
72
`master.py` runs the end-to-end Python pipeline. It calls the staged scripts in:
46
73
@@ -49,26 +76,16 @@ More details are below.
49
76
*`Predictors/` constructs stock-level predictors and outputs to `Signals/pyData/Predictors/`
50
77
*`Placebos/` constructs "not predictors" and "indirect evidence" signals and outputs to `Signals/pyData/Placebos/`
51
78
52
-
The orchestrator blocks are written to keep running even if a particular download fails (for example due to a missing subscription) so you get as much data as possible. You can track progress in `Signals/Logs/`.
53
-
54
-
#### Minimal Setup
55
-
56
-
1. From `Signals/pyCode/`, create a Python 3 virtual environment (e.g. `python3 -m venv .venv`) and install the requirements via `pip install -r requirements.txt` after activating the environment. `set_up_pyCode.py` automates these steps if you prefer.
57
-
2. Copy `dotenv.template` to `.env` and populate credentials such as `WRDS_USERNAME`, `WRDS_PASSWORD`, and any other keys you need (e.g. `FRED_API_KEY`).
58
-
3. Run the full pipeline with `python master.py` (from inside `Signals/pyCode/`). You can also run `01_DownloadData.py` and `02_CreatePredictors.py` individually if you just need part of the workflow.
59
-
4. Outputs are written to `Signals/pyData/`, and detailed logs are saved under `Signals/Logs/`.
79
+
**To run:**
80
+
```bash
81
+
cd Signals/pyCode/
82
+
python master.py
83
+
```
60
84
61
-
#### Optional Setup
62
-
63
-
The minimal setup produces the vast majority of signals. Thanks to exception handling, the pipeline will keep going even if a particular source is unavailable.
64
-
65
-
To reproduce every signal:
85
+
The orchestrator blocks are written to keep running even if a particular download fails (for example due to a missing subscription) so you get as much data as possible. You can track progress in `Signals/Logs/`.
66
86
67
-
* For IBES, 13F, OptionMetrics, and bid-ask spread signals, run the helper scripts in `Signals/pyCode/PrepScripts/` (many are designed for WRDS Cloud) and place the resulting files in `Signals/pyData/Prep/`.
68
-
* For signals that use the VIX, inflation, or broker-dealer leverage, request an [API key from FRED](https://research.stlouisfed.org/docs/api/api_key.html) and add `FRED_API_KEY` to `.env` before running the download scripts.
69
-
* For signals that rely on patent citations, BEA input-output tables, or Compustat customer data, ensure that `Rscript` is available on your system because some helper scripts shell out to R.
70
87
71
-
### 2. Portfolios/Code/
88
+
### `Portfolios/Code` Instructions
72
89
73
90
`master.R` runs everything. It:
74
91
@@ -78,29 +95,32 @@ To reproduce every signal:
78
95
79
96
It also uses `SignalDoc.csv` as a guide for how to run the portfolios.
80
97
81
-
By default the code skips the daily portfolios (`skipdaily = T`), and takes about 8 hours, assuming you examine all 300 or so signals. However, the baseline portfolios (based on predictability results in the original papers) will be done in just 30 minutes. You can keep an eye on how it's going by checking the csvs outputted to `Portfolios/Data/Portfolios/`. Every 30 minutes or so the code should output another set of portfolios. Adding the daily portfolios (`skipdaily = F`) takes an additional 12ish hours.
82
-
83
-
#### Minimal Setup
98
+
**To run:**
99
+
* Option 1 - Command line:
100
+
```bash
101
+
cd Portfolios/Code/
102
+
Rscript master.R
103
+
```
104
+
* Option 2 - RStudio: Open `master.R` in RStudio and click "Source" or press Ctrl+Shift+S (Cmd+Shift+S on Mac)
84
105
85
-
All you need to do is set `pathProject` in `master.R` to the project root directory (where `SignalDoc.csv` is). Then `master.R` will create portfolios for Price, Size, and STreversal in `Portfolios/Data/Portfolios/`.
106
+
**Before running:** You must set `pathProject` in `master.R`(line 30) to your project root directory (where `SignalDoc.csv` is located). If using RStudio, `pathProject = paste0(getwd(), '/')` should work automatically.
86
107
87
-
#### Probable Setup
108
+
By default the code skips the daily portfolios (`skipdaily = T`), and takes about 8 hours, assuming you examine all 300 or so signals. However, the baseline portfolios (based on predictability results in the original papers) will be done in just 30 minutes. You can keep an eye on how it's going by checking the csvs outputted to `Portfolios/Data/Portfolios/`. Every 30 minutes or so the code should output another set of portfolios. Adding the daily portfolios (`skipdaily = F`) takes an additional 12ish hours.
88
109
89
-
You probably want more than Price, Size, and STreversal portfolios, and so you probably want to set up more signal data before you run `master.R`.
110
+
#### Minimal Setup
90
111
112
+
To get started quickly, `master.R` will create portfolios for Price, Size, and STreversal in `Portfolios/Data/Portfolios/`.
91
113
There are a couple ways to set up this signal data:
92
114
93
115
* Run the code in `Signals/pyCode/` (see above).
94
116
* Download `Firm Level Characteristics/Full Sets/PredictorsIndiv.zip` and `Firm Level Characteristics/Full Sets/PlacebosIndiv.zip` via the [data page](https://sites.google.com/site/chenandrewy/open-source-ap) and unzip to `Signals/Data/Predictors/` and `Signals/Data/Placebos/`.
95
117
* Download only some selected csvs via the [data page](https://sites.google.com/site/chenandrewy/open-source-ap) and place in `Signals/Data/Predictors/` (e.g. just download `BM.csv`, `AssetGrowth.csv`, and `EarningsSurprise.csv` and put them in `Signals/Data/Predictors/`).
96
118
97
119
98
-
### 3. Shipping/Code/
120
+
### `Shipping/Code` Instructions
99
121
100
122
This code zips up the data, makes some quality checks, and copies files for uploading to Gdrive. You shouldn't need to use this but we keep it with the rest of the code for replicability.
101
123
102
-
----
103
124
104
-
## Contribute
105
125
106
-
Please let us know if you find typos in the code or think that we should add additional signals. You can let us know about any suggested changes via pull requests for this repo. We will keep the code up to date for other researchers to use it.
0 commit comments