Complete module documentation for the vstats library.
Core vector and matrix operations.
Files:
vectors.v- Vector operations (add, subtract, dot product, distance, magnitude)matrix.v- Matrix operationsutil.v- Utility functions- Tests:
vectors_test.v,matrix_test.v
Key Functions:
add(v, w)- Vector additionsubtract(v, w)- Vector subtractiondot(v, w)- Dot productmagnitude(v)- Vector magnitudedistance(v, w)- Euclidean distancesum_of_squares(v)- Sum of squaresflatten(m)- Flatten matrix to vector
Statistical measures, aggregations, and advanced statistical tests.
Files:
descriptive.v- Descriptive statistics and hypothesis testsdescriptive_test.v- Unit testsadvanced_tests_test.v- Advanced statistics test suite
Descriptive Functions:
mean(x)- Arithmetic meanmedian(x)- Median valuemode(x)- Mode (most frequent values)variance(x)- Sample variancestandard_deviation(x)- Sample standard deviationcorrelation(x, y)- Pearson correlationcovariance(x, y)- Covariance between two variablesquantile(x, p)- Quantile at probability pinterquartile_range(x)- IQR (Q3 - Q1)skewness(x)- Distribution asymmetry (3rd moment)kurtosis(x)- Distribution tailedness (4th moment, excess)
Advanced Statistical Tests:
anova_one_way(groups)- One-way ANOVA F-test for comparing group meansconfidence_interval_mean(x, confidence_level)- CI for population meancohens_d(group1, group2)- Cohen's d effect size for mean differencescramers_v(contingency)- Cramér's V effect size for categorical association
Probability density functions (PDF), cumulative distribution functions (CDF), and distribution utilities.
Files:
distributions.v- All probability distributions
Continuous Distributions:
- Normal:
normal_cdf(),inverse_normal_cdf() - Exponential:
exponential_pdf(),exponential_cdf() - Uniform:
uniform_pdf(),uniform_cdf() - Gamma:
gamma_pdf() - Chi-squared:
chi_squared_pdf() - Student's t:
students_t_pdf() - F-Distribution:
f_distribution_pdf() - Beta:
beta_pdf()
Discrete Distributions:
- Bernoulli:
bernoulli_pdf(),bernoulli_cdf() - Binomial:
binomial_pdf() - Poisson:
poisson_pdf(),poisson_cdf() - Negative Binomial:
negative_binomial_pdf(),negative_binomial_cdf() - Multinomial:
multinomial_pdf()
Utilities:
expectation(x, p)- Expected valuebeta_function(x, y)- Beta function
Numerical optimization algorithms for finding gradients and performing gradient descent.
Files:
algorithms.v- Optimization algorithms
Key Functions:
difference_quotient(f, x, h)- Numerical derivativepartial_difference_quotient(f, v, i, h)- Partial derivativegradient(f, v, h)- Compute full gradient vectorgradient_step(v, gradient, step_size)- Update parameters via gradient descentsum_of_squares_gradient(v)- Gradient of sum of squares function
Symbolic algebra and expression manipulation.
Files:
symbol.v- Symbolic operations
General utility functions, evaluation metrics, and training helpers.
Files:
utils.v- Basic utility functionsmetrics.v- Classification/regression metrics and training utilitiesdatasets.v- Dataset loading and splitting- Test files:
*_test.v
Basic Functions:
factorial(n)- Factorial computationcombinations(n, k)- Binomial coefficientrange(n)- Generate range of integers
Evaluation Metrics (metrics.v):
build_confusion_matrix(y_true, y_pred)- Build confusion matrix structure(ConfusionMatrix).accuracy()- Accuracy metric(ConfusionMatrix).precision()- Precision metric(ConfusionMatrix).recall()- Recall/sensitivity metric(ConfusionMatrix).specificity()- Specificity metric(ConfusionMatrix).f1_score()- F1 score(ConfusionMatrix).false_positive_rate()- FPR metric(ConfusionMatrix).summary()- Formatted summary of all metricsroc_curve(y_true, y_proba)- Calculate ROC curve and AUC(ROC_Curve).auc_value()- Extract AUC value
Utility Metrics (metrics.v):
binary_classification_metrics(y_true, y_pred)- All binary metrics in one callregression_metrics(y_true, y_pred)- All regression metrics (MSE, RMSE, MAE, R²)generate_param_grid(param_ranges)- Generate parameter combinations for grid search
Training Utilities (metrics.v):
(TrainingProgress).format_log()- Format progress for loggingearly_stopping(losses, patience)- Check early stopping criteriondecay_learning_rate(initial_lr, epoch, decay_rate)- Exponential LR scheduler
Dataset Functions (datasets.v):
load_iris()- Iris classification dataset (150 samples, 4 features, 3 classes)load_wine()- Wine classification dataset (178 samples, 13 features→4, 3 classes)load_breast_cancer()- Breast cancer classification datasetload_boston_housing()- Boston housing regression dataset (506 samples, 13 features→3)load_linear_regression()- Synthetic linear regression data(Dataset).summary()- Dataset summary statistics(Dataset).train_test_split(test_size)- Split dataset(Dataset).xy()- Get features and targets as separate arrays- Similar methods for
RegressionDataset
Supervised and unsupervised learning algorithms.
Files:
Regression models and evaluation metrics.
Models:
LinearModel- Linear regression modelLogisticModel- Logistic regression model
Key Functions:
linear_regression(x, y)- Fit OLS linear regressionlinear_predict(model, x)- Predict with linear modellogistic_regression(x, y, iterations, lr)- Fit logistic regressionlogistic_predict(model, x, threshold)- Binary classificationlogistic_predict_proba(model, x)- Prediction probabilities
Evaluation Metrics:
mse(y_true, y_pred)- Mean Squared Errorrmse(y_true, y_pred)- Root Mean Squared Errormae(y_true, y_pred)- Mean Absolute Errorr_squared(y_true, y_pred)- R² coefficient of determination
Unsupervised clustering algorithms.
Models:
KMeansModel- K-means cluster modelHierarchicalClustering- Hierarchical cluster result
Key Functions:
-
kmeans(data, k, max_iterations)- K-means clustering -
kmeans_predict(model, data)- Cluster assignment for new data -
kmeans_inertia(model, data)- Inertia (sum of squared distances) -
silhouette_coefficient(data, labels)- Cluster quality measure -
hierarchical_clustering(data, num_clusters)- Agglomerative clustering (single linkage) -
dbscan(data, eps, min_points)- Density-based clustering- Returns labels (0 = noise, >0 = cluster ID)
Deep learning components for building neural networks.
Files:
Neural network layers and activation functions.
Layer Types:
DenseLayer- Fully connected layerActivationLayer- Non-linear activationBatchNormLayer- Batch normalization
Activation Functions:
relu(x)- ReLU activationsigmoid(x)- Sigmoid activationtanh(x)- Hyperbolic tangentsoftmax(x)- Softmax (multi-class output)
Key Functions:
dense_layer(input_size, output_size)- Create dense layer(layer).forward(input)- Forward pass(layer).backward(grad, input, lr)- Backward pass (backpropagation)activation_layer(fn_name)- Create activation layerdropout(input, rate)- Dropout regularizationflatten(data)- Reshape 2D to 1Dreshape(data, rows, cols)- Reshape 1D to 2D
Convolution & Pooling:
conv1d(input, kernel, stride)- 1D convolutionmax_pool1d(input, pool_size, stride)- Max poolingavg_pool1d(input, pool_size, stride)- Average pooling
Loss functions for training neural networks.
Key Functions:
mse_loss(y_true, y_pred)- Mean Squared Errormae_loss(y_true, y_pred)- Mean Absolute Errorbinary_crossentropy_loss(y_true, y_pred)- Binary classificationcategorical_crossentropy_loss(y_true, y_pred)- Multi-class classificationsparse_categorical_crossentropy_loss(y_true, y_pred)- Multi-class (integer labels)hinge_loss(y_true, y_pred)- SVM-like losshuber_loss(y_true, y_pred, delta)- Robust to outlierskl_divergence_loss(y_true, y_pred)- KL divergencecosine_similarity_loss(y_true, y_pred)- Cosine distancecontrastive_loss(y_true, distance, margin)- Siamese networkstriplet_loss(anchor, positive, negative, margin)- Metric learning
Gradient Functions:
mse_loss_gradient()- MSE gradientmae_loss_gradient()- MAE gradientbinary_crossentropy_loss_gradient()- BCE gradient
High-level neural network construction and training.
Main Class:
NeuralNetwork- Sequential neural network
Key Functions:
sequential(layer_sizes, activation_fn)- Create sequential network(nn).forward(input)- Forward pass(nn).backward(grad, input, lr)- Backward pass(nn).train(x_train, y_train, config)- Train network(nn).predict(x)- Predict on batch(nn).predict_single(x)- Predict on single sample(nn).evaluate(x_test, y_test)- Evaluate on test set(nn).get_weights()- Extract weights(nn).get_biases()- Extract biases(nn).set_weights(weights)- Set weights
Configuration:
TrainingConfig- Training parameters (learning_rate, epochs, batch_size, verbose)default_training_config()- Default configtraining_config(lr, epochs, batch_size)- Custom config
Industry-standard experimentation workflows: A/B testing, propensity score matching, and difference-in-differences.
Dependencies: ml, hypothesis, stats, prob, linalg
Files:
A/B testing, power analysis, and CUPED variance reduction.
Structs:
ABTestConfig—alpha(default 0.05),equal_varianceABTestResult— means, SDs, lift, Cohen's d, t-stat, df, p-value, CI, significance flagPowerAnalysisResult—n_per_group,power,alpha,effect_sizeCUPEDResult—theta,variance_reduction,adjusted_result
Key Functions:
abtest(control, treatment, cfg)— Welch's t-test with effect size, relative lift, and CIpower_analysis(effect_size, alpha, power)— Required n per group via normal approximationcuped_test(y_ctrl, y_treat, pre_ctrl, pre_treat, cfg)— CUPED-adjusted A/B test using pre-experiment covariates
Sample size calculation for experiments before data collection.
Structs:
SampleSizeResult—n_per_group,total_n,alpha,power,mde,baseline,effect_size,baseline_std,method
Key Functions:
sample_size_proportions(baseline_rate, mde, alpha, power)— n per group for conversion/proportion metrics;mdeis absolute rate change (e.g. 0.01 for +1pp)sample_size_means(baseline_mean, baseline_std, mde_absolute, alpha, power)— n per group for continuous metrics; effect size field = Cohen's d
Two-proportion z-test for comparing conversion rates.
Structs:
ProportionTestConfig—alpha(default 0.05);@[params]ProportionTestResult—rate_a,rate_b,diff,relative_lift,z_statistic,p_value,significant,ci_lower,ci_upper,pooled_se,n_a,n_b
Key Functions:
proportion_test(successes_a, n_a, successes_b, n_b, cfg)— Pooled z-test under H₀; CI uses unpooled SE andalphafrom config
Sequential Probability Ratio Test (SPRT) for safe interim analysis.
Types:
SPRTDecision— enum:continue_testing,reject_null,accept_nullSPRTConfig—alpha(0.05),beta(0.20),mde(required, no default); NOT@[params]SPRTResult—log_likelihood_ratio,decision,upper_boundary,lower_boundary,rate_a,rate_b,n_a,n_b
Key Functions:
sprt_test(successes_a, n_a, successes_b, n_b, cfg)— One-shot Bernoulli SPRT over cumulative totals; call repeatedly at each interim check
Bayesian A/B test using Beta-Binomial conjugate model.
Structs:
BayesianConfig—alpha_prior(1.0),beta_prior(1.0),n_samples(10000);@[params]BayesianResult—posterior_mean_a/b,prob_b_beats_a,expected_loss_a/b,ci_lower/upper_a/b,successes_a/b,n_a/b
Key Functions:
bayesian_ab_test(successes_a, n_a, successes_b, n_b, cfg)— Beta posteriors via Marsaglia-Tsang sampler; Monte Carlo estimates for P(B>A), expected loss, and 95% credible intervals
Propensity score matching and covariate balance checking.
Structs:
PropensityModel— fitted logistic model, scores, treatment vectorPropensityConfig—iterations,learning_rate,trimMatchingConfig—caliper,replacementMatchedPair—treated_idx,control_idx,ps_distanceMatchingResult—pairs, matched/unmatched counts, average distanceBalanceResult— SMDs before/after matching,mean_abs_smd_*,balancedflagATEResult—ate,se, CI, t-stat, p-value, group sizes
Key Functions:
estimate_propensity_scores(x, treatment, cfg)— Logistic regression for p(T=1|X); optional common-support trimmingmatch_nearest_neighbor(model, cfg)— Greedy O(n_T × n_C) nearest-neighbor matchingcheck_balance(x, treatment, result)— Standardised mean differences before and after matchingate_matched(y, treatment, result)— ATE from matched pairs with two-sample t-test
Difference-in-Differences estimation, regression DiD, parallel trends testing, and event studies.
Structs:
DiDConfig—alphaDiDResult— DiD effect, SE, t-stat, p-value, CI, group changes, cell sizesDiDRegressionResult— OLS interaction coefficient, SE, CI, R², nParallelTrendsResult— slopes per group, slope difference, t-stat, p-value,parallel_trends_holdEventStudyResult—relative_times,effects,std_errors,t_statistics,p_values, CIs
Key Functions:
did_2x2(y_treat_pre, y_treat_post, y_ctrl_pre, y_ctrl_post, cfg)— Classic 2×2 DiD with delta-method SEdid_regression(y, x, group, time, cfg)— OLS with treat×post interaction; OLS standard errors via (X'X)⁻¹test_parallel_trends(y_treated_pre, y_control_pre, time_pre, cfg)— Tests slope equality in pre-period via pooled OLSevent_study(y, group, relative_time, cfg)— Period-by-period DiD using period -1 as reference
Statistical tests and hypothesis testing functions.
Files:
tests.v- Statistical hypothesis tests
Parametric Tests:
t_test_one_sample(x, mu, params)- One-sample t-testt_test_two_sample(x, y, params)- Two-sample t-test (equal variances)correlation_test(x, y, params)- Test significance of correlationchi_squared_test(observed, expected)- Goodness of fit test
Non-Parametric Tests:
wilcoxon_signed_rank_test(x, y)- Paired samples testmann_whitney_u_test(x, y)- Independent samples testshapiro_wilk_test(x)- Normality test
Return Values:
All tests return (test_statistic, p_value) tuple.
Parameters:
TestParamsstruct withalpha(significance level, default 0.05)
Industry-standard product and marketing metrics, funnel analysis, cohort analysis, and attribution modeling.
Files:
Revenue, customer, and retention metrics.
Revenue Metrics:
arpa(revenue, accounts)- Average Revenue Per Accountarpu(revenue, users)- Average Revenue Per Usermonthly_recurring_revenue(plan_revenues)- MRR calculationannual_recurring_revenue(mrr)- ARR from MRR
Customer Metrics:
cac(acquisition_spend, new_customers)- Customer Acquisition Costltv(revenue, users, lifespan)- Lifetime Valueltv_cac_ratio(...)- LTV:CAC ratio (healthy: 3:1+)payback_period(cac, monthly_arpu)- Payback period in monthsmagic_number(net_new_arr, gross_margin, sales_marketing_spend)- SaaS efficiency
Retention Metrics:
churn_rate(customers_lost, total_customers)- Customer churn rateretention_rate(customers_lost, total_customers)- 1 - Churn Ratenet_revenue_retention(mrr_start, mrr_end, churn_mrr, expansion_mrr)- NRRgross_revenue_retention(mrr_start, churn_mrr)- GRR
Financial Metrics:
burn_rate(starting_cash, ending_cash, months)- Monthly burn raterunway_months(current_cash, monthly_burn)- Months of runway
Conversion funnel analysis and optimization.
Structs:
FunnelStage— name, users, conversions, dropoutsFunnelResult— stages, conversion_rate, total_conversionFunnelConversion— from/to, rate, drop_off_rate
Key Functions:
create_funnel(stage_names, stage_users)- Create funnel from stage datastage_conversion_rate(from, to)- Conversion between stages(FunnelResult).get_conversions()- Detailed conversion data(FunnelResult).highest_drop_off()- Stage with most leakagefunnel_leakage(funnel)- Users lost at each stageprojected_conversions(funnel, additional_users)- Project with more trafficsegment_funnel(segment_data)- Compare funnels across segments
Cohort analysis and retention matrix computation.
Structs:
CohortPeriod— period_index, cohort_size, retained, revenue, retentionCohort— name, periodsCohortAnalysis— cohorts, retention_matrix, avg_retention, ltv_by_period
Key Functions:
create_cohort_analysis(cohort_names, initial_sizes, retention_data)- Build cohort analysis(CohortAnalysis).retention_at_period(cohort, period)- Retention at specific point(CohortAnalysis).avg_retention_at_period(period)- Average across cohorts(CohortAnalysis).churn_by_period()- Monthly churn rates(CohortAnalysis).compare_cohorts(name_a, name_b)- Compare two cohorts(CohortAnalysis).ltv_projection(periods, avg_revenue)- Project LTV
Marketing channel attribution modeling.
Structs:
AttributionResult— channel, conversions, revenue, attribution_score
Attribution Models:
first_touch_attributes(channels, conversions, revenue)- 100% to first touchlast_touch_attributes(channels, conversions, revenue)- 100% to last touchlinear_attributes(touchpoints, conversions, revenue)- Equal credittime_decay_attributes(touchpoints, days, conversions, revenue, half_life)- Recent biasposition_based_attributes(touchpoints, conversions, revenue)- 40/20/40 split
Channel Analytics:
channel_roi(attribution_results, channel_costs)- ROI per channeloptimal_channel_mix(channel_performance, total_budget)- Budget allocationroas(revenue, ad_spend)- Return on Ad Spendblended_roas(total_revenue, total_ad_spend)- Blended ROAS
import ml
// Sample data: 5 samples, 2 features
x := [[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0], [5.0, 6.0]]
y := [3.0, 5.0, 7.0, 9.0, 11.0]
// Fit model
model := ml.linear_regression(x, y)
// Predict
predictions := ml.linear_predict(model, x)
// Evaluate
mse_score := ml.mse(y, predictions)import ml
data := [[1.0, 1.0], [1.5, 1.5], [10.0, 10.0], [10.5, 10.5]]
model := ml.kmeans(data, 2, 100)
// Evaluate clustering quality
silhouette := ml.silhouette_coefficient(data, model.labels)import nn
// Create network: input(10) -> hidden(5) -> output(1)
mut network := nn.sequential([10, 5, 1], 'relu')
// Prepare data
x_train := [...] // 100 samples, 10 features each
y_train := [...] // 100 targets
// Train
config := nn.training_config(0.01, 100, 32)
network.train(x_train, y_train, config)
// Predict
predictions := network.predict(x_test)import hypothesis
// One-sample t-test
data := [1.0, 2.0, 3.0, 4.0, 5.0]
t_stat, p_val := hypothesis.t_test_one_sample(data, 3.0, hypothesis.TestParams{})
if p_val < 0.05 {
println("Reject null hypothesis")
}- Built-in V modules:
math,arrays,rand - Cross-module dependencies:
mldepends onlinalg,statsnndepends onlinalg,mathhypothesisdepends onstats,probprobdepends onlinalg,math,utilsexperimentdepends onml,hypothesis,stats,prob,linalggrowthdepends onmath(standalone module)
| Module | Files | Functions | Purpose |
|---|---|---|---|
| linalg | 4 | 20+ | Vector/matrix operations |
| stats | 3 | 18+ | Descriptive & advanced statistics |
| prob | 1 | 20+ | Probability distributions |
| optim | 1 | 5+ | Optimization algorithms |
| ml | 3 | 25+ | Machine learning algorithms |
| nn | 3 | 40+ | Neural networks & layers |
| hypothesis | 1 | 7+ | Statistical hypothesis tests |
| experiment | 7 | 20+ | A/B testing, sample size, proportion z-test, SPRT, Bayesian, PSM, DiD |
| growth | 4 | 30+ | Growth metrics, funnel, cohort, attribution |
| symbol | 1 | ? | Symbolic computation |
| utils | 5 | 35+ | Metrics, utilities, datasets |
-
Layered Design: Each module has clear dependencies, with lower-level modules (linalg, utils) supporting higher-level ones (ml, nn, experiment)
-
Functional Style: Emphasizes pure functions with minimal state mutation
-
Type Safety: Uses V's strong type system with structs for models and configuration
-
Separation of Concerns:
- Algorithms in one file, loss functions in another
- Models and training in separate files
- Tests in dedicated test files
-
Numerical Stability: Includes safeguards (e.g., clipping in log operations, max subtraction in softmax)
-
Extensibility: Modular activation functions, loss functions, and optimization algorithms can be easily extended