A lightweight, fast SQL query processor written in C that enables executing SQL queries directly on CSV files without requiring a database. Features include joins, subqueries, aggregations, arithmetic expressions, and more.
# Install (no Make required)
./install.sh # Unix-like
install.bat # Windows
# Run a query
./build/cq -q "SELECT name, age FROM data.csv WHERE age > 25" -p
# Save results to CSV
cq -q "SELECT * FROM data.csv WHERE role = 'admin'" -o output.csv
# Count matching rows
cq -q "SELECT * FROM data.csv WHERE age > 30" -c
# Run query from file
cq -f assets/example_between.sql -p
# Read query from stdin
cat query.sql | cq -q - -pThe assets/ directory contains ready-to-run SQL examples:
- example_between.sql - BETWEEN operator with SQL comments
- example_aggregation.sql - Complex aggregations with GROUP BY and HAVING
- example_dml.sql - INSERT, UPDATE, DELETE operations
- example_functions.sql - String and math functions
- example_joins.sql - JOIN operations between CSV files
- example_case.sql - CASE expressions (simple and searched forms)
- example_window_functions.sql - Window functions (ROW_NUMBER, RANK, LAG, LEAD, running aggregates)
Run any example:
cq -f assets/example_between.sql -pModify CSV files directly:
# INSERT - Add new rows
cq -q "INSERT INTO 'data/users.csv' (id, name, age) VALUES (100, 'Mario', 30)"
cq -q "INSERT INTO 'data/users.csv' VALUES (101, 'Luigi', 28, 'user', 175, 1)"
# UPDATE - Modify existing rows
cq -q "UPDATE 'data/users.csv' SET age = 31 WHERE name = 'Mario'"
cq -q "UPDATE 'data/users.csv' SET role = 'admin', active = 1 WHERE age > 25"
# DELETE - Remove rows (WHERE clause required for safety)
cq -q "DELETE FROM 'data/users.csv' WHERE active = 0"
cq -q "DELETE FROM 'data/users.csv' WHERE age < 18"
# DELETE all rows (requires --force flag)
cq -q "DELETE FROM 'data/users.csv'" --forceNotes:
- All DML operations modify the CSV file in-place
- DELETE requires WHERE clause (safety measure to prevent accidental data loss)
- Use --force flag to allow DELETE without WHERE clause (deletes all rows)
- Use quotes around file paths with special characters:
'data/file.csv' - Column names in INSERT are optional if providing all values in order
Create new CSV files from query results or with specified schema:
# CREATE TABLE - Save query results to new file
cq -q "CREATE TABLE 'output.csv' AS SELECT * FROM 'data.csv' WHERE age > 25"
# CREATE TABLE - With filtering and aggregation
cq -q "CREATE TABLE 'summary.csv' AS SELECT role, COUNT(*) as cnt, AVG(age) as avg_age FROM 'users.csv' GROUP BY role"
# CREATE TABLE - Combine multiple files
cq -q "CREATE TABLE 'all_data.csv' AS SELECT * FROM 'data1.csv' UNION ALL SELECT * FROM 'data2.csv'"
# CREATE TABLE - With JOIN
cq -q "CREATE TABLE 'report.csv' AS SELECT u.name, r.role_name FROM 'users.csv' AS u JOIN 'roles.csv' AS r ON u.role_id = r.id"
# CREATE TABLE - Create empty file with schema
cq -q "CREATE TABLE 'new_table.csv' (id, name, age, role)"
# CREATE TABLE - Define schema for file without header
cq -q "CREATE TABLE 'data_no_header.csv' AS (col1, col2, col3)"Notes:
- CREATE TABLE AS SELECT materializes query results into a new CSV file
- If file exists, it will be replaced
- Useful for creating derived tables, reports, or intermediate results
- Empty schema creation useful for defining structure before data insertion
- Schema mapping (AS (col1, col2)) creates empty file with specified column names
Modify the structure of existing CSV files:
# ALTER TABLE - Rename a column
cq -q "ALTER TABLE 'users.csv' RENAME COLUMN old_name TO new_name"
# ALTER TABLE - Add a new column (filled with empty values)
cq -q "ALTER TABLE 'users.csv' ADD COLUMN email"
# ALTER TABLE - Drop a column (removes all data in that column)
cq -q "ALTER TABLE 'users.csv' DROP COLUMN middle_name"
# Multiple operations can be chained by running commands sequentially
cq -q "ALTER TABLE 'data.csv' RENAME COLUMN id TO user_id"
cq -q "ALTER TABLE 'data.csv' ADD COLUMN created_at"
cq -q "ALTER TABLE 'data.csv' DROP COLUMN deprecated_field"Supported Operations:
- RENAME COLUMN: Changes the column name in the header
- ADD COLUMN: Adds a new column at the end (existing rows get empty values)
- DROP COLUMN: Removes a column and all its data (cannot drop the last column)
Notes:
- All operations modify the CSV file in place
- Column name matching is case-insensitive
- Cannot drop the last remaining column
- Cannot add a column that already exists
- ADD COLUMN fills existing rows with empty values
- Original file is overwritten (make backups if needed)
Unix-like (Linux, macOS):
- C compiler (GCC or Clang)
- Make
- Standard C library
Linux ARM64 (Raspberry Pi):
- GCC for ARM64 (
gcc-aarch64-linux-gnufor cross-compilation) - Or native GCC on Raspberry Pi OS 64-bit
- Make
- Standard C library
Windows:
- Visual Studio Build Tools (MSVC) or MinGW
- Standard C library
Quick Install (no Make required):
# Unix-like (Linux, macOS, BSD)
./install.sh
# Windows (MSVC)
install.batUsing Make:
Unix-like (Linux, macOS):
# Build the main executable
make
# Build and run all tests
make test
# Clean build artifacts
make cleanWindows (MSVC):
# Build using the provided batch script
install.bat
# Or manually with MSVC
cl.exe /W3 /O2 /Iinclude src\*.c /Febuild\cq.exeWindows (MinGW/Git Bash):
# Use standard Unix Makefile or install script
make
# or
./install.shRaspberry Pi (ARM64):
# Native build on Raspberry Pi
./install.sh
# or
make
# Cross-compile from x86_64 Linux
sudo apt-get install gcc-aarch64-linux-gnu
make CC=aarch64-linux-gnu-gcc| Category | Keywords |
|---|---|
| Query Structure | SELECT, DISTINCT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, OFFSET |
| Data Definition | CREATE, ALTER, TABLE, AS, RENAME, COLUMN, ADD, DROP, TO |
| Data Manipulation | INSERT, INTO, VALUES, UPDATE, SET, DELETE |
| Joins | JOIN, INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN, ON |
| Set Operations | UNION, UNION ALL, INTERSECT, EXCEPT |
| Window Functions | ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, OVER, PARTITION |
| Logical Operators | AND, OR, NOT, IN, NOT IN |
| Comparison | =, !=, <>, <, >, <=, >=, BETWEEN |
| Pattern Matching | LIKE, ILIKE |
| Sorting | ASC, DESC |
| Aliases | AS |
Both single-line and multi-line comments are supported:
-- This is a single-line comment
SELECT name, age FROM 'users.csv' -- inline comment
WHERE age > 20
/* This is a
multi-line comment */
SELECT * FROM 'data.csv'
WHERE /* inline block comment */ status = 'active'CASE expressions provide conditional logic in SQL queries. Two forms are supported:
Simple CASE - Compares an expression against multiple values:
CASE expression
WHEN value1 THEN result1
WHEN value2 THEN result2
...
ELSE default_result
ENDSearched CASE - Evaluates conditions sequentially:
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
...
ELSE default_result
ENDExamples:
-- Simple CASE: categorize by exact age match
SELECT name, age,
CASE age
WHEN 25 THEN 'Quarter century'
WHEN 30 THEN 'Thirty'
WHEN 40 THEN 'Forty'
ELSE 'Other age'
END AS age_label
FROM users.csv
-- Searched CASE: categorize by age ranges
SELECT name, age,
CASE
WHEN age < 18 THEN 'Minor'
WHEN age >= 18 AND age < 65 THEN 'Adult'
WHEN age >= 65 THEN 'Senior'
ELSE 'Unknown'
END AS age_group
FROM users.csv
-- CASE with numeric results
SELECT name,
CASE
WHEN age < 30 THEN 1
WHEN age < 40 THEN 2
ELSE 3
END AS tier
FROM users.csv
-- Nested CASE expressions
SELECT name,
CASE
WHEN age < 30 THEN
CASE
WHEN age < 20 THEN 'Very young'
ELSE 'Young'
END
ELSE 'Older'
END AS category
FROM users.csv
-- CASE in WHERE clause
SELECT name, age FROM users.csv
WHERE CASE
WHEN role = 'admin' THEN 1
WHEN age > 50 THEN 1
ELSE 0
END = 1
-- CASE in ORDER BY
SELECT name, age FROM users.csv
ORDER BY CASE
WHEN age < 30 THEN 3
WHEN age < 50 THEN 2
ELSE 1
END, name
-- CASE without ELSE (returns NULL if no match)
SELECT name,
CASE
WHEN age > 100 THEN 'Centenarian'
END AS special_status
FROM users.csvNotes:
- If no WHEN condition matches and there's no ELSE clause, the result is NULL
- CASE expressions can be nested
- CASE works in SELECT, WHERE, ORDER BY, and GROUP BY clauses
- Both string and numeric results are supported
Using CASE Aliases in GROUP BY, ORDER BY, and WHERE:
CQ supports using SELECT column aliases in GROUP BY, ORDER BY, and WHERE clauses, making queries more readable and avoiding repetition of complex expressions:
-- GROUP BY with CASE alias
SELECT CASE WHEN age < 30 THEN 'young' ELSE 'old' END AS category,
COUNT(*) AS cnt
FROM users.csv
GROUP BY category
-- GROUP BY with multiple columns (including aliases)
SELECT role,
CASE WHEN age < 30 THEN 'junior' ELSE 'senior' END AS level,
COUNT(*) AS total,
AVG(age) AS avg_age
FROM users.csv
GROUP BY role, level
ORDER BY total DESC
-- ORDER BY with CASE alias
SELECT name,
CASE WHEN LENGTH(name) > 5 THEN 'long' ELSE 'short' END AS name_len
FROM users.csv
ORDER BY name_len, name
-- WHERE with alias (extension - non-standard SQL)
SELECT age,
CASE WHEN age % 2 = 0 THEN 'even' ELSE 'odd' END AS parity,
name
FROM users.csv
WHERE parity = 'even'
-- Combine all three
SELECT name, age,
CASE WHEN age < 30 THEN 'junior'
WHEN age < 40 THEN 'mid'
ELSE 'senior' END AS level
FROM users.csv
WHERE level != 'junior'
ORDER BY level, ageNote: Using aliases in WHERE is a CQ extension. Standard SQL evaluates WHERE before SELECT, so aliases aren't normally available. However, this feature is convenient for computed columns like CASE expressions.
COUNT(*) -- Count all rows
COUNT(column) -- Count non-null values
SUM(column) -- Sum numeric values
AVG(column) -- Average of numeric values
MIN(column) -- Minimum value
MAX(column) -- Maximum value
STDDEV(column) -- Population standard deviation
MEDIAN(column) -- Median value (50th percentile)CONCAT(str1, str2, ...) -- Concatenate strings
UPPER(str) -- Convert to uppercase
LOWER(str) -- Convert to lowercase
LENGTH(str) -- String length
SUBSTRING(str, start, len) -- Extract substring (1-indexed)
REPLACE(str, from, to) -- Replace all occurrences
COALESCE(val1, val2, ...) -- First non-null valuePOWER(base, exponent) -- Raise base to exponent power
SQRT(number) -- Square root (NULL for negative numbers)
CEIL(number) -- Round up to nearest integer
CEILING(number) -- Alias for CEIL
FLOOR(number) -- Round down to nearest integer
ROUND(number, [decimals]) -- Round to specified decimal places (default 0)
ABS(number) -- Absolute value
EXP(number) -- e raised to the power of number
LN(number) -- Natural logarithm (NULL for numbers ≤ 0)
LOG(number) -- Alias for LN
MOD(dividend, divisor) -- Modulo (remainder of division, NULL if divisor is 0)+ -- Addition
- -- Subtraction
* -- Multiplication
/ -- Division
% -- Modulo
& -- Bitwise AND
| -- Bitwise OR
^ -- Bitwise XOROperator Precedence (highest to lowest):
- Parentheses
() - Multiplication, Division, Modulo
*,/,% - Addition, Subtraction
+,- - Bitwise
&,|,^
cq [OPTIONS]
Options:
-h Show help message
-q <query> SQL query to execute (use '-' to read from stdin)
-f <file> Read SQL query from file
-o <file> Write results as CSV to output file
-c Print count of rows
-p Print results as formatted table to stdout
-v Print results in vertical format (one column per line)
-s <char> Field separator for input CSV (default: ',')
-d <char> Output delimiter for -o option (default: ',')
-F, --force Allow DELETE without WHERE clause
Examples:
# Print formatted table
cq -q "SELECT name, age WHERE age > 30" -p
# Read query from file
cq -f query.sql -p
# Read query from stdin (piping)
echo "SELECT * WHERE active = 1" | cq -q - -p
cat query.sql | cq -q - -o output.csv
# Save to CSV file
cq -q "SELECT * WHERE active = 1" -o output.csv
# Count matching rows
cq -q "SELECT * WHERE role = 'admin'" -c
# Custom delimiter (TSV input)
cq -q "SELECT * FROM data.tsv" -s '\t' -p
# Vertical output (useful for wide tables)
cq -q "SELECT * FROM data.csv LIMIT 5" -v
# Combine options
cq -q "SELECT name FROM data.csv WHERE age > 25" -o filtered.csv -cThe engine automatically infers types from CSV data:
| Type | Description | Examples |
|---|---|---|
INTEGER |
Whole numbers | 42, -17, 0 |
DOUBLE |
Floating point | 3.14, -0.5, 1.23e-4 |
STRING |
Text values | "Alice", "admin" |
NULL |
Missing values | Empty CSV cells |
- Header row (automatically detected)
- Custom delimiters (
,,\t,;, etc.) - Quoted fields (
"value with, comma") - Escaped quotes (
"quote ""inside"" field") - Large files (memory-mapped I/O)
id,name,age,role,height,active
1,Alice,25,admin,165.50,1
2,Bob,30,user,178.20,1
3,Charlie,35,moderator,172.00,0SELECT DISTINCT gender FROM users.csv
SELECT DISTINCT city, country FROM addresses.csv-- LIKE: case-sensitive pattern matching
SELECT name FROM users.csv WHERE name LIKE 'A%' -- Starts with 'A'
SELECT email FROM users.csv WHERE email LIKE '%@gmail.com' -- Ends with
SELECT code FROM products.csv WHERE code LIKE 'USB-___' -- Exactly 3 chars after USB-
SELECT name FROM users.csv WHERE name LIKE '%john%' -- Contains 'john'
-- ILIKE: case-insensitive pattern matching
SELECT name FROM users.csv WHERE name ILIKE 'alice' -- Matches Alice, ALICE, alice
SELECT email FROM users.csv WHERE email ILIKE '%@EXAMPLE.COM'
-- Wildcards:
-- % matches any sequence of characters (including empty)
-- _ matches exactly one character-- UNION: Combine results, remove duplicates
SELECT name FROM customers_2023.csv
UNION
SELECT name FROM customers_2024.csv
-- UNION ALL: Combine results, keep all rows (including duplicates)
SELECT product FROM sales_q1.csv
UNION ALL
SELECT product FROM sales_q2.csv
-- INTERSECT: Return only rows that appear in both queries
SELECT email FROM newsletter.csv
INTERSECT
SELECT email FROM customers.csv
-- EXCEPT: Return rows from first query that are NOT in second query
SELECT email FROM all_users.csv
EXCEPT
SELECT email FROM inactive_users.csv
-- Chaining multiple set operations
SELECT id FROM table_a.csv
UNION
SELECT id FROM table_b.csv
INTERSECT
SELECT id FROM table_c.csv
-- Set operations with WHERE clauses
SELECT name FROM users.csv WHERE age < 30
UNION
SELECT name FROM users.csv WHERE role = 'admin'Set Operation Rules:
- Both queries must have the same number of columns
- Column names come from the first query
- UNION removes duplicates, UNION ALL keeps all rows
- INTERSECT returns common rows (duplicates removed)
- EXCEPT returns rows in first but not second (duplicates removed)
- Operations can be chained: (A UNION B) INTERSECT C
-- INNER JOIN
SELECT t1.name, t2.email
FROM users.csv AS t1
INNER JOIN emails.csv AS t2 ON t1.id = t2.user_id
-- LEFT JOIN
SELECT t1.name, t2.phone
FROM users.csv AS t1
LEFT JOIN contacts.csv AS t2 ON t1.id = t2.id
-- FULL JOIN
SELECT * FROM table1.csv FULL JOIN table2.csv ON table1.id = table2.id-- Subquery in FROM (derived table)
SELECT * FROM (SELECT name, age FROM users.csv WHERE age > 25) AS sub
-- Subquery in WHERE with IN
SELECT name FROM users.csv WHERE role IN (SELECT role FROM admins.csv)
-- Scalar subquery in SELECT
SELECT name, (SELECT MAX(age) FROM users.csv) AS max_age FROM users.csv
-- Correlated subquery
SELECT name, age FROM users.csv AS t1
WHERE age > (SELECT AVG(age) FROM users.csv WHERE role = t1.role)SELECT role, COUNT(*) AS cnt, AVG(age) AS avg_age
FROM users.csv
GROUP BY role
HAVING COUNT(*) > 5 AND AVG(age) > 30
ORDER BY cnt DESCWindow functions perform calculations across a set of rows that are related to the current row. Unlike aggregate functions with GROUP BY, window functions retain all rows in the result.
Supported Window Functions:
ROW_NUMBER() -- Sequential number for each row within partition
RANK() -- Rank with gaps for tied values
DENSE_RANK() -- Rank without gaps for tied values
LAG(column) -- Access previous row's value
LEAD(column) -- Access next row's value
SUM(column) -- Running/cumulative sum
AVG(column) -- Running/cumulative average
COUNT(*) -- Running/cumulative countBasic Window Function:
-- Add row numbers to all rows
SELECT name, age,
ROW_NUMBER() OVER (ORDER BY age) AS row_num
FROM users.csv
-- Rank employees by salary
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees.csvPARTITION BY - Separate Window per Group:
-- Row number within each department
SELECT department, name, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees.csv
-- Running total per category
SELECT category, product, price,
SUM(price) OVER (PARTITION BY category ORDER BY price) AS running_total
FROM products.csvLAG and LEAD - Access Adjacent Rows:
-- Compare with previous row
SELECT name, age,
LAG(age) OVER (ORDER BY age) AS prev_age,
age - LAG(age) OVER (ORDER BY age) AS age_diff
FROM users.csv
-- Compare with next row
SELECT date, value,
LEAD(value) OVER (ORDER BY date) AS next_value,
LEAD(value) OVER (ORDER BY date) - value AS change
FROM timeseries.csvRunning Aggregates:
-- Running sum and average
SELECT date, amount,
SUM(amount) OVER (ORDER BY date) AS running_sum,
AVG(amount) OVER (ORDER BY date) AS running_avg
FROM sales.csv
-- Running count
SELECT product, sale_date,
COUNT(*) OVER (ORDER BY sale_date) AS total_sales_so_far
FROM orders.csvNotes:
- All window functions require an OVER clause
- ORDER BY within OVER determines row ordering for the calculation
- PARTITION BY divides rows into groups (optional)
- Window functions are evaluated after WHERE, GROUP BY, and HAVING
- Multiple window functions can be used in the same query
- LAG/LEAD respect ORDER BY within partitions
- Running aggregates (SUM, AVG, COUNT) are cumulative and progressive
Examples:
# Basic window function
./build/cq -q "SELECT name, age, ROW_NUMBER() OVER (ORDER BY age) AS row_num FROM 'data/users.csv' ORDER BY age" -p
# Running sum
./build/cq -q "SELECT name, age, SUM(age) OVER (ORDER BY age) AS running_sum FROM 'data/users.csv' ORDER BY age" -p
# PARTITION BY with LAG
./build/cq -q "SELECT role, name, age, LAG(age) OVER (PARTITION BY role ORDER BY age) AS prev_age FROM 'data/users.csv'" -pSELECT UPPER(CONCAT(SUBSTRING(name, 1, 1), '. ', surname)) AS formatted
FROM users.csv
SELECT LENGTH(REPLACE(UPPER(name), 'A', '')) AS modified_len
FROM users.csv-- Simple SELECT
SELECT name, age FROM data.csv
-- SELECT with WHERE
SELECT name, age FROM data.csv WHERE age > 30 AND role = 'admin'
-- SELECT with ORDER BY
SELECT name, age FROM data.csv ORDER BY age DESC
-- SELECT with LIMIT
SELECT name, age FROM data.csv LIMIT 10
-- SELECT with LIMIT and OFFSET (pagination)
SELECT name, age FROM data.csv ORDER BY age LIMIT 10 OFFSET 20-- Basic arithmetic in SELECT
SELECT name, age, age * 2 AS double_age, age + 10 AS age_plus_10
FROM data.csv
-- Arithmetic with proper precedence
SELECT price + tax, quantity * (price + tax) AS total
FROM orders.csv
-- Modulo for finding patterns
SELECT name, age FROM data.csv WHERE age % 10 = 0
-- Bitwise operations
SELECT id, flags & 15 AS lower_bits FROM data.csv WHERE (flags & 16) > 0-- Concatenation
SELECT CONCAT(name, ' - ', role) AS info FROM data.csv
-- Case conversion
SELECT UPPER(name) AS upper_name, LOWER(role) AS lower_role FROM data.csv
-- String manipulation
SELECT SUBSTRING(name, 1, 3) AS initials,
LENGTH(name) AS name_len,
REPLACE(role, 'admin', 'administrator') AS new_role
FROM data.csv
-- Functions in WHERE
SELECT name FROM data.csv WHERE LENGTH(name) > 5 AND UPPER(role) = 'ADMIN'-- Power and roots
SELECT POWER(2, 10) AS power_result FROM data.csv -- 1024
SELECT SQRT(age) AS sqrt_age FROM data.csv WHERE age >= 0
-- Rounding functions
SELECT CEIL(height) AS rounded_up FROM data.csv
SELECT FLOOR(height) AS rounded_down FROM data.csv
SELECT ROUND(height, 1) AS rounded_1dp FROM data.csv
SELECT ROUND(height) AS rounded_int FROM data.csv
-- Absolute value
SELECT ABS(age - 30) AS age_diff FROM data.csv
-- Exponential and logarithm
SELECT EXP(1) AS e FROM data.csv -- 2.718...
SELECT LN(age) AS log_age FROM data.csv WHERE age > 0
-- Modulo
SELECT age, MOD(age, 10) AS last_digit FROM data.csv
-- Combined math operations
SELECT ROUND(SQRT(POWER(x, 2) + POWER(y, 2)), 2) AS distance FROM coordinates.csv
SELECT FLOOR(LN(population)) AS log_pop FROM cities.csv
-- Math in WHERE clause
SELECT name FROM data.csv WHERE SQRT(age) > 5
SELECT name FROM data.csv WHERE MOD(age, 5) = 0
SELECT name FROM data.csv WHERE POWER(age, 2) > 1000-- LIKE: case-sensitive
SELECT * FROM products.csv WHERE name LIKE 'iPhone%'
SELECT * FROM users.csv WHERE email LIKE '%@company.com'
SELECT * FROM codes.csv WHERE code LIKE 'A__B' -- A, 2 chars, B
-- ILIKE: case-insensitive
SELECT * FROM users.csv WHERE name ILIKE 'john%'
SELECT * FROM emails.csv WHERE email ILIKE '%@GMAIL.COM'
-- Combined with other conditions
SELECT name, email FROM users.csv
WHERE name LIKE 'A%' AND email ILIKE '%@example.com'
-- Pattern in subquery
SELECT * FROM orders.csv
WHERE customer_id IN (
SELECT id FROM customers.csv WHERE name LIKE 'Smith%'
)-- Basic aggregation
SELECT COUNT(*) FROM data.csv WHERE active = 1
-- GROUP BY with aggregates
SELECT role, COUNT(*) AS count, AVG(age) AS avg_age, MAX(height) AS max_height
FROM data.csv
GROUP BY role
-- Statistical aggregates
SELECT
AVG(age) AS mean_age,
STDDEV(age) AS stddev_age,
MEDIAN(age) AS median_age,
MIN(age) AS min_age,
MAX(age) AS max_age
FROM data.csv
-- Statistical aggregates with GROUP BY
SELECT role, COUNT(*) AS cnt, AVG(age) AS avg, STDDEV(age) AS stdev, MEDIAN(age) AS median
FROM data.csv
GROUP BY role
-- Aggregates without GROUP BY (entire table as one group)
SELECT COUNT(*) AS total, AVG(age) AS avg_age, MIN(age) AS min_age
FROM data.csv
-- HAVING clause
SELECT role, COUNT(*) AS cnt FROM data.csv
GROUP BY role
HAVING COUNT(*) > 2
ORDER BY cnt DESC-- UNION: Combine and remove duplicates
SELECT name, email FROM customers_2023.csv
UNION
SELECT name, email FROM customers_2024.csv
-- UNION ALL: Keep all rows including duplicates
SELECT product_id FROM orders_jan.csv
UNION ALL
SELECT product_id FROM orders_feb.csv
-- INTERSECT: Find common rows
SELECT user_id FROM premium_users.csv
INTERSECT
SELECT user_id FROM active_users.csv
-- EXCEPT: Find differences
SELECT email FROM all_subscribers.csv
EXCEPT
SELECT email FROM bounced_emails.csv
-- Multiple operations chained
SELECT id FROM group_a.csv
UNION SELECT id FROM group_b.csv
INTERSECT SELECT id FROM active.csv
-- With complex queries
SELECT name FROM users.csv WHERE age > 30 AND city = 'NYC'
UNION
SELECT name FROM admins.csv WHERE role = 'super_admin'
EXCEPT
SELECT name FROM banned.csv-- Join with aggregation
SELECT t.role, COUNT(*) AS cnt, MAX(t.height) AS max_h
FROM 'data/users.csv' AS t
LEFT JOIN 'data/emails.csv' AS e ON t.id = e.user_id
WHERE t.age > 25
GROUP BY t.role
HAVING COUNT(*) > 1
ORDER BY max_h DESC
LIMIT 10
-- Subquery in FROM with filtering
SELECT * FROM (
SELECT name, age, role FROM data.csv WHERE age > 18
) AS adults
WHERE role = 'user'
-- Multiple subqueries
SELECT name, age,
(SELECT AVG(age) FROM data.csv) AS avg_age,
(SELECT MAX(age) FROM data.csv WHERE role = t1.role) AS role_max
FROM data.csv AS t1
WHERE age > (SELECT AVG(age) FROM data.csv)make testtests/
├── test_alter_table.c # ALTER TABLE operations (8 tests)
├── test_arithmetic.c # Arithmetic expressions (22 tests)
├── test_create_table.c # CREATE TABLE operations (8 tests)
├── test_csv.c # CSV loading and parsing
├── test_distinct.c # DISTINCT keyword (4 tests)
├── test_dml.c # INSERT/UPDATE/DELETE operations (8 tests)
├── test_evaluator.c # Query evaluation
├── test_extended_operators.c # Modulo, bitwise, NOT (23 tests)
├── test_like.c # LIKE and ILIKE operators (8 tests)
├── test_load_performance.c # Performance benchmarking
├── test_parser.c # SQL parsing
├── test_set_ops.c # UNION, INTERSECT, EXCEPT (8 tests)
├── test_tokenizer.c # Lexical analysis
└── test_where_functions.c # Functions in WHERE (10 tests)
# Run specific test
./build/test_arithmetic
./build/test_create_table
./build/test_distinct
./build/test_dml
./build/test_evaluatorTested with 1,000,000 rows:
- Load time: ~273ms
- Throughput: ~3.67M rows/second
- Memory usage: ~113 MB
- Uses memory-mapped I/O for large CSV files
- Efficient value storage (integers vs strings)
- Proper memory cleanup (no leaks)
cq/
├── include/ # Header files
│ ├── csv_reader.h # CSV parsing and data structures
│ ├── evaluator.h # Query execution engine
│ ├── parser.h # SQL parser and AST
│ ├── tokenizer.h # Lexical analyzer
│ └── utils.h # Utility functions
├── src/ # Source files
│ ├── csv_reader.c # CSV file loading with mmap
│ ├── evaluator.c # Query evaluation and execution
│ ├── main.c # CLI interface
│ ├── parser.c # Recursive descent SQL parser
│ ├── tokenizer.c # Tokenization and lexical analysis
│ └── utils.c # Helper functions
├── tests/ # Test suite
├── data/ # Test data files
└── Makefile # Build configuration
SQL String
↓
Tokenizer → [Tokens]
↓
Parser → [Abstract Syntax Tree]
↓
Evaluator → [ResultSet]
↓
Output (Table/CSV/Count)
-- NULL in comparisons evaluates to false
SELECT * FROM data.csv WHERE email = NULL -- Returns no rows
-- Use scalar functions for NULL handling
SELECT COALESCE(email, 'N/A') AS email FROM data.csv
-- NULL in aggregates are ignored
SELECT AVG(age) FROM data.csv -- Ignores NULL ages-- Strings compared lexicographically
WHERE name > 'M'
-- Numbers automatically promoted to doubles when needed
SELECT age / 3.0 FROM data.csv -- Result is always double
-- Integer division returns integer if no remainder
SELECT 10 / 2 -- Returns 5 (integer)
SELECT 10 / 3 -- Returns 3.333... (double)-- Use table aliases to avoid ambiguity in joins
SELECT t1.name, t2.email
FROM users.csv AS t1
JOIN contacts.csv AS t2 ON t1.id = t2.id
-- Column without qualifier tries all tables
SELECT name FROM users.csv -- Works if name exists"Error: Query is required (use -q)"
- Solution: Always provide the
-qflag with your query
"Error: Failed to load CSV file"
- Check file path exists
- Verify file permissions
- Ensure CSV format is valid
"Parse error" / "Syntax error"
- Check SQL syntax matches supported features
- Verify all parentheses are balanced
- Check for typos in keywords
# Enable detailed logging (if implemented)
cq -q "SELECT * FROM data.csv" -p --verbose
# Check file with small query first
cq -q "SELECT * FROM data.csv LIMIT 5" -pcc -g -Wall -W -Iinclude src/*.c -o build/cq_debugmake address_sanitizer- C99 standard
- 4-space indentation
- Clear variable names
- Comments for complex logic
- GCC 4.8+
- Clang 3.5+
- Any C99-compliant compiler
- Standard C library only
- No external dependencies
- POSIX for memory-mapped I/O
- DISTINCT keyword (deduplication)
- LIKE/ILIKE operators (pattern matching)
- UNION/INTERSECT/EXCEPT (set operations)
- Subqueries (FROM, WHERE, scalar, correlated)
- Arithmetic expressions with precedence
- Scalar and aggregate functions
- Statistical aggregates (STDDEV, MEDIAN)
- All join types (INNER, LEFT, RIGHT, FULL)
- Data manipulation (INSERT, UPDATE, DELETE)
- CREATE TABLE (save query results, define schema)
- ALTER TABLE (rename/add/drop columns)
- Read queries from file (-f option)
- Read queries from stdin (piping support)
- SQL comments (-- and /* */)
- CASE expressions (simple and searched)
- Window functions (ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, running aggregates)
- Index support for large files
- Query optimization
- Watch mode (cq --watch "SELECT * FROM access.log WHERE status=500")
- Output formatter (cq -o json "SELECT * FROM users.csv" | jq; cq -o yaml "SELECT * FROM config.csv" | yq)
- Remote streaming (cq "SELECT * FROM https://website/export.csv WHERE year=2024"; cq "SELECT * FROM s3://bucket/data.csv LIMIT 100")
- Interactive REPL (cq --interactive)
- Git versioned data query (cq --git HEAD~5 "SELECT * FROM users.csv WHERE active=true")
- Date type support
-- Basic query
SELECT name, age FROM data.csv WHERE age > 30
-- With functions
SELECT UPPER(name), LENGTH(name) AS len FROM data.csv
-- Complex join with aggregates
SELECT t.role, MAX(t.height) AS max_h, LOWER(t2.email) AS email
FROM './data.csv' AS t
LEFT JOIN './emails.csv' AS t2 ON t.id = t2.id
WHERE t.age > 25
GROUP BY t.role
ORDER BY MAX(t.height) DESC
LIMIT 5
-- CREATE TABLE - Save query results
CREATE TABLE 'active_users.csv' AS
SELECT * FROM 'users.csv' WHERE active = 1
-- CREATE TABLE - Aggregation report
CREATE TABLE 'summary.csv' AS
SELECT role, COUNT(*) as cnt, AVG(age) as avg_age
FROM 'users.csv'
GROUP BY role
-- CREATE TABLE - Empty table with schema
CREATE TABLE 'new_table.csv' (id, name, email, created_at)
-- String manipulation
SELECT
name,
CONCAT(name, ' - ', role) AS info,
SUBSTRING(name, 1, 3) AS short,
REPLACE(role, 'admin', 'ADMIN') AS new_role
FROM data.csv
-- Pagination
SELECT name, age
FROM data.csv
ORDER BY age DESC
LIMIT 10 OFFSET 20 -- Page 3 (20 records skipped)
-- SELECT * variations
SELECT * FROM data.csv LIMIT 5
SELECT *, UPPER(name) AS name_upper FROM data.csv
SELECT id, *, age FROM data.csv WHERE age > 30
-- Arithmetic expressions
SELECT name, age * 2 AS double_age, height / 100 AS height_m FROM data.csv
SELECT price + tax AS total, quantity * price AS subtotal FROM orders.csv
-- Nested functions
SELECT UPPER(CONCAT(name, ' - ', role)) AS full_info FROM data.csv
SELECT REPLACE(UPPER(CONCAT(name, ' ', role)), 'ADMIN', 'ADMINISTRATOR') FROM data.csv
SELECT LENGTH(CONCAT(name, role)) AS total_len FROM data.csv
SELECT SUBSTRING(UPPER(name), 1, 3) AS short FROM data.csv
-- Functions in WHERE clause
SELECT name FROM data.csv WHERE LENGTH(name) > 5
SELECT name FROM data.csv WHERE UPPER(role) = 'ADMIN'
SELECT name FROM data.csv WHERE LENGTH(CONCAT(name, role)) > 10
SELECT name FROM data.csv WHERE SUBSTRING(name, 1, 1) = 'A'
-- BETWEEN operator (inclusive on both ends)
SELECT name, age FROM 'users.csv' WHERE age BETWEEN 25 AND 35
SELECT * FROM 'products.csv' WHERE price BETWEEN 10.0 AND 50.0
SELECT name FROM 'users.csv' WHERE name BETWEEN 'A' AND 'M'
SELECT * FROM 'sales.csv' WHERE date BETWEEN '2024-01-01' AND '2024-12-31'
SELECT name, age FROM 'users.csv' WHERE age * 2 BETWEEN 50 AND 70
-- CASE expressions
SELECT name, age,
CASE age
WHEN 25 THEN 'Quarter century'
WHEN 30 THEN 'Thirty'
ELSE 'Other'
END AS age_label
FROM data.csv
SELECT name,
CASE
WHEN age < 18 THEN 'Minor'
WHEN age < 65 THEN 'Adult'
ELSE 'Senior'
END AS age_group
FROM data.csv
SELECT name, age,
CASE
WHEN age < 30 THEN 1
WHEN age < 50 THEN 2
ELSE 3
END AS tier
FROM data.csv
SELECT * FROM data.csv
WHERE CASE WHEN age < 30 THEN 1 ELSE 0 END = 1
SELECT name FROM data.csv
ORDER BY CASE WHEN age < 30 THEN 2 WHEN age < 50 THEN 1 ELSE 0 END
-- Scalar functions with GROUP BY
SELECT role, UPPER(role) AS upper_role FROM data.csv GROUP BY role
SELECT role, LENGTH(role) AS len, COUNT(*) FROM data.csv GROUP BY role
SELECT role, UPPER(CONCAT('Role: ', role)) AS formatted FROM data.csv GROUP BY role
SELECT role, LENGTH(UPPER(role)) AS len, SUM(age) FROM data.csv GROUP BY role
-- HAVING clause
SELECT role, COUNT(*) FROM data.csv GROUP BY role HAVING COUNT(*) > 2
SELECT role, AVG(age) AS avg_age FROM data.csv GROUP BY role HAVING AVG(age) > 30
SELECT role, COUNT(*) AS cnt FROM data.csv GROUP BY role HAVING cnt >= 2 ORDER BY cnt DESC
SELECT role, COUNT(*), AVG(age) FROM data.csv GROUP BY role HAVING COUNT(*) >= 2 AND AVG(age) > 30
-- Subqueries in FROM clause
SELECT * FROM (SELECT name, age FROM data.csv) AS sub WHERE age > 30
SELECT name FROM (SELECT * FROM data.csv WHERE role = 'admin') AS sub
SELECT role, COUNT(*) FROM (SELECT * FROM data.csv WHERE age > 25) AS sub GROUP BY role
SELECT * FROM (SELECT role, AVG(age) AS avg_age FROM data.csv GROUP BY role HAVING AVG(age) > 30) AS sub
-- Subqueries in WHERE clause (IN operator)
SELECT name FROM data.csv WHERE role IN (SELECT role FROM data.csv WHERE age > 30)
SELECT name FROM data.csv WHERE name IN (SELECT name FROM data.csv WHERE role = 'admin')
SELECT name, age FROM data.csv WHERE age IN (SELECT age FROM data.csv WHERE age > 40)
SELECT name FROM data.csv WHERE role IN (SELECT role FROM data.csv WHERE age > 30) AND age < 40
-- Scalar subqueries in SELECT
SELECT name, (SELECT MAX(age) FROM data.csv) AS max_age FROM data.csv LIMIT 1
SELECT name, (SELECT AVG(age) FROM data.csv) AS avg_age FROM data.csv WHERE age > 25
SELECT name, (SELECT COUNT(*) FROM data.csv WHERE role = 'admin') AS admin_count FROM data.csv
SELECT name, (SELECT MIN(age) FROM data.csv) AS min, (SELECT MAX(age) FROM data.csv) AS max FROM data.csv
-- Scalar subqueries in WHERE
SELECT name FROM data.csv WHERE age = (SELECT MAX(age) FROM data.csv)
SELECT name FROM data.csv WHERE age > (SELECT AVG(age) FROM data.csv)
SELECT name FROM data.csv WHERE age < (SELECT AVG(age) FROM data.csv WHERE role = 'admin')
-- Correlated subqueries
SELECT name, age, role FROM data.csv AS t1
WHERE age > (SELECT AVG(age) FROM data.csv WHERE role = t1.role)
SELECT name, role, (SELECT COUNT(*) FROM data.csv WHERE role = t1.role) AS role_count
FROM data.csv AS t1
SELECT name, age FROM data.csv AS t1
WHERE age = (SELECT MAX(age) FROM data.csv WHERE role = t1.role)-- Simple limit
LIMIT 10
-- With offset (standard SQL)
LIMIT 10 OFFSET 5
-- With offset (MySQL style)
LIMIT 5, 10 -- offset 5, limit 10Full Support - Complex arithmetic with proper operator precedence and parentheses
SELECT age + 10, age * 2, (age + 5) * 2 FROM data.csv
SELECT name, age FROM data.csv WHERE age * 2 > 60
SELECT (age + 10) * 2 / 5 FROM data.csv WHERE (age + 5) > 30
-- Modulo operator
SELECT age % 10 FROM data.csv WHERE age % 2 = 0
-- Bitwise operators
SELECT age & 15, age | 1 FROM data.csv WHERE (age & 16) > 0
SELECT id ^ 255 AS xor_result FROM data.csv -- XOR operationOperators:
- Arithmetic:
+,-,*,/,%(modulo) - Bitwise:
&(AND),|(OR),^(XOR) - Operator precedence: parentheses >
*,/,%>+,->&,|,^ - Parentheses for grouping:
(2 + 3) * 4= 20 - Works in SELECT and WHERE clauses
- Mixed integer and floating-point arithmetic
- Type preservation: integer operations return integers when possible
- Bitwise operations require integer operands
NOT Operator - Logical negation of conditions
SELECT name FROM data.csv WHERE NOT age > 30
SELECT name FROM data.csv WHERE NOT (age > 20 AND age < 30)
SELECT name FROM data.csv WHERE age NOT IN (25, 30, 35)Features:
NOTprefix operator for negating conditions- Works with parenthesized complex conditions
NOT INfor list/subquery exclusion- Can be combined with
AND/OR
-- Without parentheses: 2 + (3 * 4) = 14
SELECT 2 + 3 * 4 FROM test_data.csv
-- Result: 14 for all rows
-- With parentheses: (2 + 3) * 4 = 20
SELECT (2 + 3) * 4 FROM test_data.csv
-- Result: 20 for all rows-- Nested operations with columns
SELECT name, age, (age + 10) * 2 - 5 FROM test_data.csv
-- Result: For Alice (age 25): (25 + 10) * 2 - 5 = 65
-- WHERE with arithmetic
SELECT name, age FROM test_data.csv WHERE (age + 5) * 2 > 70
-- Result: Charlie (35), Eve (42), Grace (33)-- Integer arithmetic stays integer when possible
SELECT age / 2 FROM test_data.csv WHERE age = 30
-- Result: 15 (integer)
-- Mixed or non-whole results become double
SELECT age / 3 FROM test_data.csv WHERE age = 25
-- Result: 8.33 (double)$ ./build/test_load_performance
=== CSV Load Performance Test ===
Testing file: data/bigdata.csv
Results:
--------
Rows loaded: 1000000
Columns: 5
Total time: 238.79 ms
Time per row: 0.0002 ms
Rows per second: 4187796
Breakdown:
File I/O + CSV parsing: 238.79 ms (100%)
Memory usage (approximate):
Total: 113.27 MB
Per row: 0.12 KB
=== Test completed successfully ===Version: 1.0
Author: Mario
Last Updated: December 2025
MIT License - see LICENSE file for details
