Skip to content

Commit e6b44cb

Browse files
Chuck LitzellRahul Iyer
authored andcommitted
Cox Proportional Hazards: Edit documentation to maintain consistency
1 parent 6ec85c3 commit e6b44cb

File tree

1 file changed

+68
-93
lines changed

1 file changed

+68
-93
lines changed

src/ports/postgres/modules/stats/cox_prop_hazards.sql_in

Lines changed: 68 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ m4_include(`SQLCommon.m4')
1818
<div class="toc"><b>Contents</b>
1919
<ul>
2020
<li class="level1"><a href="#training">Training Function</a>
21+
<li class="level1"><a href="#cox_zph">PHA Test Function</a>
2122
<li class="level1"><a href="#examples">Examples</a></li>
22-
<li class="level1"><a href="#cox_zph">PHA test function</a>
2323
<li class="level1"><a href="#background">Technical Background</a></li>
2424
<li class="level1"><a href="#related">Related Topics</a></li>
2525
</ul>
@@ -38,22 +38,20 @@ the probability that death has happened before time t.
3838

3939
Following is the syntax for the coxph_train() training function:
4040
<pre class="syntax">
41-
coxph_train(
42-
source_table,
43-
output_table,
44-
dependent_variable,
45-
independent_variable,
46-
right_censoring_status,
47-
strata,
48-
optimizer_params
49-
)
41+
coxph_train( source_table,
42+
output_table,
43+
dependent_variable,
44+
independent_variable,
45+
right_censoring_status,
46+
strata,
47+
optimizer_params
48+
)
5049
</pre>
51-
5250
\b Arguments
5351
<dl class="arglist">
5452
<dt>source_table</dt>
5553
<dd>TEXT. The name of the table containing input data.</dd>
56-
<dt>out_table</dt>
54+
<dt>output_table</dt>
5755
<dd>TEXT. The name of the table where the output model is saved.
5856
The output is saved in the table named by the <em>output_table</em> argument. It has the following columns:
5957
<table class="output">
@@ -63,7 +61,7 @@ coxph_train(
6361
</tr>
6462
<tr>
6563
<th>loglikelihood</th>
66-
<td>FLOAT8. Log-likelihood value of the MLE estimate</td>
64+
<td>FLOAT8. Log-likelihood value of the MLE estimate.</td>
6765
</tr>
6866
<tr>
6967
<th>std_err</th>
@@ -79,29 +77,29 @@ coxph_train(
7977
</tr>
8078
<tr>
8179
<th>hessian</th>
82-
<td>The vectorized Hessian matrix computed using the final solution.</td>
80+
<td>FLOAT8[]. The vectorized Hessian matrix computed using the final solution.</td>
8381
</tr>
8482
<tr>
8583
<th>num_iterations</th>
86-
<td>The number of iterations performed by the optimizer</td>
84+
<td>INTEGER. The number of iterations performed by the optimizer.</td>
8785
</tr>
8886
</table>
8987
</dd>
90-
<dd> Additionally, an output summary table is also generated that contains
91-
a summary of the parameters used for building the cox model. It is stored
88+
<dd> Additionally, a summary output table is generated that contains
89+
a summary of the parameters used for building the Cox model. It is stored
9290
in a table named <em>output_table</em>_summary. It has the following columns:
9391
<table class="output">
9492
<tr>
9593
<th>source_table</th>
96-
<td>Source table name</td>
94+
<td>The source table name.</td>
9795
</tr>
9896
<tr>
9997
<th>dep_var</th>
100-
<td>dependent variable name</td>
98+
<td>The dependent variable name.</td>
10199
</tr>
102100
<tr>
103101
<th>ind_var</th>
104-
<td>independent variable name</td>
102+
<td>The independent variable name.</td>
105103
</tr>
106104
<tr>
107105
<th>right_censoring_status</th>
@@ -116,7 +114,7 @@ coxph_train(
116114

117115
<dt>dependent_variable</dt>
118116
<dd>TEXT. A string containing the name of a column that contains
119-
an array of numeric values, or a string expression in the format 'array[1, x1, x2, x3]',
117+
an array of numeric values, or a string expression in the format 'ARRAY[1, x1, x2, x3]',
120118
where <em>x1</em>, <em>x2</em> and <em>x3</em> are column names. Dependent
121119
variables refer to the time of death. There is no need to pre-sort the data.</dd>
122120
<dt>independent_variable</dt>
@@ -128,78 +126,60 @@ coxph_train(
128126
expression (i.e., 'true', 'false', '0', '1') that applies to all observations,
129127
or a Boolean expression such as 'column_name < 10' that can be evaluated for each
130128
observation.</dd>
131-
<dt>strata</dt>
132-
<dd>VARCHAR, default: NULL, which does not do any stratifications. It should be a string that contains the column names separated by commas, which are the columns (strata ID variables) used to do stratification.</dd>
133-
<dt>optimizer_params</dt>
129+
<dt>strata (optional)</dt>
130+
<dd>VARCHAR, default: NULL, which does not do any stratifications. A string of comma-separated column names that are the strata ID variables used to do stratification.</dd>
131+
<dt>optimizer_params (optional)</dt>
134132
<dd>VARCHAR, default: NULL, which uses the default values of optimizer parameters: max_iter=20, optimizer='newton', tolerance=1e-4. It should be a string that contains pairs of 'key=value' separated by commas.</dd>
135133
</dl>
136134

137-
@anchor notes
138-
@par Notes
139-
140-
- All table names can be optionally schema qualified (current_schemas() would be
141-
used if a schema name is not provided) and all table and column names
142-
should follow case-sensitivity and quoting rules per the database.
143-
(For instance, 'mytable' and 'MyTable' both resolve to the same entity, i.e. 'mytable'.
144-
If mixed-case or multi-byte characters are desired for entity names then the
145-
string should be double-quoted; in this case the input would be '"MyTable"').
146-
147-
- The cox_prop_hazards_regr() and cox_prop_hazards() functions have been
148-
deprecated; coxph_train() should be used instead.
149-
150135
@anchor cox_zph
151-
@par Test the Proportional Hazards Assumption of a Cox Regression
136+
@par Proportional Hazards Assumption Test Function
152137

153-
Proportional-Hazard models enable the comparison of various survival models.
138+
The cox_zph() function tests the proportional hazards assumption (PHA) of a Cox regression.
139+
140+
Proportional-hazard models enable the comparison of various survival models.
154141
These PH models, however, assume that the hazard for a given individual is a
155142
fixed proportion of the hazard for any other individual, and the ratio of the
156-
hazards is constant across time. We currently don't provide performing any
157-
transformation of the time to compute the correlation.
143+
hazards is constant across time. MADlib does not currently have support for
144+
performing any transformation of the time to compute the correlation.
158145

159146
The <em>cox_zph()</em> function is used to test this assumption by computing the correlation
160147
of the residual of the coxph_train model with time.
161148

162-
To display a brief summary of the PH assumption test function, call the \ref cox_zph()
163-
function with no argument:
164-
@verbatim
165-
SELECT madlib.cox_zph();
166-
@endverbatim
167-
168149
Following is the syntax for the cox_zph() function:
169150
<pre class="syntax">
170-
cox_zph(
171-
cox_model_table,
172-
output_table
173-
)
151+
cox_zph( cox_model_table,
152+
output_table
153+
)
174154
</pre>
175-
176155
\b Arguments
177156
<dl class="arglist">
178157
<dt>cox_model_table</dt>
179-
<dd>TEXT. The name of the table containing the Cox Proportional-Hazards model</dd>
158+
<dd>TEXT. The name of the table containing the Cox Proportional-Hazards model.</dd>
180159

181160
<dt>output_table</dt>
182-
<dd>TEXT. The name of the table where the test statistics are saved</dd>
161+
<dd>TEXT. The name of the table where the test statistics are saved.
183162
The output table is named by the <em>output_table</em> argument and has
184-
the following columns
163+
the following columns:
185164
<table class="output">
186165
<tr>
187166
<th>rho</th>
188167
<td>FLOAT8[]. Vector of the correlation coefficients between
189-
survival time and the scaled Schoenfeld residuals </td>
168+
survival time and the scaled Schoenfeld residuals.</td>
190169
</tr>
191170
<tr>
192171
<th>chi_square</th>
193-
<td> FLOAT8[]. Chi-square test statistic for the correlation analysis</td>
172+
<td> FLOAT8[]. Chi-square test statistic for the correlation analysis.</td>
194173
</tr>
195174
<tr>
196175
<th>p_value</th>
197-
<td>FLOAT8[]. Two-side p-value for the chi-square statistic</td>
176+
<td>FLOAT8[]. Two-side p-value for the chi-square statistic.</td>
198177
</tr>
199178
</table>
179+
</dd>
200180
</dl>
201181

202-
Additionally, the residual values are outputed in table named as <em>output_table</em>_residual.
182+
Additionally, the residual values are outputted to the table named <em>output_table</em>_residual.
203183
The table contains the following columns:
204184
<table class="output">
205185
<tr>
@@ -213,19 +193,31 @@ The table contains the following columns:
213193
</tr>
214194
<tr>
215195
<th>scaled_reisdual</th>
216-
<td>Residual values scaled by the variance of the coefficients</td>
196+
<td>Residual values scaled by the variance of the coefficients.</td>
217197
</tr>
218198
</table>
219199

200+
@anchor notes
201+
@par Notes
202+
203+
- Table names can be optionally schema qualified (current_schemas() is
204+
used if a schema name is not provided) and table and column names
205+
should follow case-sensitivity and quoting rules per the database.
206+
For instance, 'mytable' and 'MyTable' both resolve to the same entity&mdash;'mytable'.
207+
If mixed-case or multi-byte characters are desired for entity names then the
208+
string should be double-quoted; in this case the input would be '"MyTable"'.
209+
210+
- The cox_prop_hazards_regr() and cox_prop_hazards() functions have been
211+
deprecated; coxph_train() should be used instead.
212+
220213
@anchor examples
221214
@examp
222215
-# View online help for the proportional hazards training method.
223216
<pre class="example">
224217
SELECT madlib.coxph_train();
225-
SELECT madlib.coxph_train('usage');
226218
</pre>
227219

228-
-# Create an input data set:
220+
-# Create an input data set.
229221
<pre class="example">
230222
DROP TABLE IF EXISTS sample_data;
231223
CREATE TABLE sample_data (
@@ -262,16 +254,16 @@ COPY sample_data FROM STDIN DELIMITED BY '|';
262254
24 | 1 | 5 | 1 | t
263255
\.
264256
</pre>
265-
-# Run the cox regression function:
257+
-# Run the Cox regression function.
266258
<pre class="example">
267-
SELECT madlib.coxph_train(
268-
'sample_data',
269-
'sample_cox',
270-
'timedeath',
271-
'ARRAY[grp,wbc]',
272-
'status');
259+
SELECT madlib.coxph_train( 'sample_data',
260+
'sample_cox',
261+
'timedeath',
262+
'ARRAY[grp,wbc]',
263+
'status'
264+
);
273265
</pre>
274-
-# View the results of the regression:
266+
-# View the results of the regression.
275267
<pre class="example">
276268
\\x on
277269
SELECT * FROM sample_cox;
@@ -284,20 +276,19 @@ std_err | {0.677308807341768,0.387308633304678}
284276
z_stats | {3.75676700265663,4.31653830257251}
285277
p_values | {0.000172122613528057,1.58495189046891e-05}
286278
</pre>
287-
-# View online help for function to test Proportional Hazards assumption
279+
-# View online help for the function to test Proportional Hazards Assumption.
288280
<pre class="example">
289-
SELECT madlid.cox_zph();
290-
SELECT madlib.cox_zph('usage');
281+
SELECT madlib.cox_zph();
291282
</pre>
292283

293284
-# Run the test for Proportional Hazards assumption to obtain correlation between
294285
residuals and time.
295286
<pre class="example">
296-
SELECT madlid.cox_zph(
297-
'sample_cox',
298-
'sample_zph_output');
287+
SELECT madlib.cox_zph( 'sample_cox',
288+
'sample_zph_output'
289+
);
299290
</pre>
300-
-# View results of the PHA test
291+
-# View results of the PHA test.
301292
<pre class="example">
302293
SELECT * FROM sample_zph_output;
303294
</pre>
@@ -310,22 +301,6 @@ p_value | {0.991991010621734,0.878854560410758}
310301
</pre>
311302

312303

313-
@anchor seealso
314-
@sa File cox_prop_hazards.sql_in documenting the functions
315-
316-
@anchor notes
317-
@note
318-
319-
- Table names can be optionally schema qualified (current_schemas() is
320-
searched if a schema name is not provided) and table and column names
321-
should follow case-sensitivity and quoting rules per the database.
322-
(For instance, 'mytable' and 'MyTable' both resolve to the same entity, i.e. 'mytable'.
323-
If mixed-case or multi-byte characters are desired for entity names then the
324-
string should be double-quoted; in this case the input would be '"MyTable"'.
325-
326-
- The cox_prop_hazards_regr() function has been deprecated, and
327-
cox_prop_hazards() should be used instead.
328-
329304
@anchor background
330305
@par Technical Background
331306

0 commit comments

Comments
 (0)