Skip to content

Commit 49f11d2

Browse files
committed
Merge branch 'blobs'
2 parents c802f5d + 3c0a4fc commit 49f11d2

File tree

5 files changed

+135
-19
lines changed

5 files changed

+135
-19
lines changed

docs/deployments/configuration.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -821,7 +821,7 @@ storage:
821821
prefetchWrites: true
822822
```
823823

824-
`path` - _Type_: string; _Default_: `<rootPath>/schema`
824+
`path` - _Type_: string; _Default_: `<rootPath>/database`
825825

826826
The `path` configuration sets where all database files should reside.
827827

@@ -831,6 +831,15 @@ storage:
831831
```
832832
_**Note:**_ This configuration applies to all database files, which includes system tables that are used internally by HarperDB. For this reason if you wish to use a non default `path` value you must move any existing schemas into your `path` location. Existing schemas is likely to include the system schema which can be found at `<rootPath>/schema/system`.
833833

834+
`blobPaths` - _Type_: string; _Default_: `<rootPath>/blobs`
835+
836+
The `blobPaths` configuration sets where all the blob files should reside. This can be an array of paths, and if there are multiple, the blobs will be distributed across the paths.
837+
838+
```yaml
839+
storage:
840+
blobPaths:
841+
- /users/harperdb/big-storage
842+
```
834843

835844
`pageSize` - _Type_: number; _Default_: Defaults to the default page size of the OS
836845

docs/developers/applications/defining-schemas.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,7 @@ HarperDB supports the following field types in addition to user defined (object)
201201
* `Any`: Any primitive, object, or array is allowed.
202202
* `Date`: A Date object.
203203
* `Bytes`: Binary data (as a Buffer or Uint8Array).
204+
* `Blob`: Binary data designed for large blocks of data that can be streamed. It is recommend that you use this for binary data that will typically be larger than 20KB.
204205

205206
#### Renaming Tables
206207

docs/developers/operations-api/clustering.md

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -128,33 +128,36 @@ _Operation is restricted to super_user roles only_
128128
"type": "cluster-status",
129129
"connections": [
130130
{
131-
"url": "wss://server-two:9925",
132-
"subscriptions": [
133-
{
134-
"schema": "dev",
135-
"table": "my-table",
136-
"publish": true,
137-
"subscribe": true
138-
}
139-
],
140-
"name": "server-two",
131+
"replicateByDefault": true,
132+
"replicates": true,
133+
"url": "wss://server-2.domain.com:9933",
134+
"name": "server-2.domain.com",
135+
"subscriptions": null,
141136
"database_sockets": [
142137
{
143-
"database": "dev",
138+
"database": "data",
144139
"connected": true,
145-
"latency": 0.84197798371315,
146-
"threadId": 1,
140+
"latency": 0.70,
141+
"thread_id": 1,
147142
"nodes": [
148-
"server-two"
149-
]
150-
}
151-
]
143+
"server-2.domain.com"
144+
],
145+
"lastCommitConfirmed": "Wed, 12 Feb 2025 19:09:34 GMT",
146+
"lastReceivedRemoteTime": "Wed, 12 Feb 2025 16:49:29 GMT",
147+
"lastReceivedLocalTime": "Wed, 12 Feb 2025 16:50:59 GMT",
148+
"lastSendTime": "Wed, 12 Feb 2025 16:50:59 GMT"
149+
},
152150
}
153151
],
154-
"node_name": "server-one",
152+
"node_name": "server-1.domain.com",
155153
"is_enabled": true
156154
}
157155
```
156+
There is a separate socket for each database for each node. Each node is represented in the connections array, and each database connection to that node is represented in the `database_sockets` array. Additional timing statistics include:
157+
* `lastCommitConfirmed`: When a commit is sent out, it should receive a confirmation from the remote server; this is the last receipt of confirmation of an outgoing commit.
158+
* `lastReceivedRemoteTime`: This is the timestamp of the transaction that was last received. The timestamp is from when the original transaction occurred.
159+
* `lastReceivedLocalTime`: This is local time when the last transaction was received. If there is a different between this and `lastReceivedRemoteTime`, it means there is a delay from the original transaction to * receiving it and so it is probably catching-up/behind.
160+
* `sendingMessage`: The timestamp of transaction is actively being sent. This won't exist if the replicator is waiting for the next transaction to send.
158161

159162
---
160163

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
Blobs are binary large objects that can used to store any type of unstructured/binary data and is designed for large content. Blobs support streaming and feature better performance for content larger than about 20KB. Blobs are built off the native JavaScript `Blob` type, and HarperDB extends the native `Blob` type for integrated storage with the database. To use blobs, you would generally want to declare a field as a `Blob` type in your schema:
2+
```graphql
3+
type MyTable {
4+
id: Any! @primaryKey
5+
data: Blob
6+
}
7+
```
8+
9+
You can then create a blob which writes the binary data to disk, and can then be included (as a reference) in a record. For example, you can create a record with a blob like:
10+
11+
```javascript
12+
let blob = await createBlob(largeBuffer);
13+
await MyTable.put({ id: 'my-record', data: blob });
14+
```
15+
The `data` attribute in this example is a blob reference, and can be used like any other attribute in the record, but it is stored separately, and the data must be accessed asynchronously. You can retrieve the blob data with the standard `Blob` methods:
16+
17+
```javascript
18+
let buffer = await blob.bytes();
19+
```
20+
21+
If you are creating a resource method, you can return a `Response` object with a blob as the body:
22+
23+
```javascript
24+
export class MyEndpoint extends MyTable {
25+
async get() {
26+
return {
27+
status: 200,
28+
headers: {},
29+
body: this.data, // this.data is a blob
30+
});
31+
}
32+
}
33+
```
34+
One of the important characteristics of blobs is they natively support asynchronous streaming of data. This is important for both creation and retrieval of large data. When we create a blob with `createBlob`, the returned blob will create the storage entry, but the data will be streamed to storage. This means that you can create a blob from a buffer or from a stream. You can also create a record that references a blob before the blob is fully written to storage. For example, you can create a blob from a stream:
35+
36+
```javascript
37+
let blob = await createBlob(stream);
38+
// at this point the blob exists, but the data is still being written to storage
39+
await MyTable.put({ id: 'my-record', data: blob });
40+
// we now have written a record that references the blob
41+
let record = await MyTable.get('my-record');
42+
// we now have a record that gives us access to the blob. We can asynchronously access the blob's data or stream the data, and it will be available as blob the stream is written to the blob.
43+
let stream = record.data.stream();
44+
```
45+
This can be powerful functionality for large media content, where content can be streamed into storage as it streamed out in real-time to users as it is received.
46+
Alternately, we can also wait for the blob to be fully written to storage before creating a record that references the blob:
47+
48+
```javascript
49+
let blob = await createBlob(stream);
50+
// at this point the blob exists, but the data is was not been written to storage
51+
await blob.save(MyTable);
52+
// we now know the blob is fully written to storage
53+
await MyTable.put({ id: 'my-record', data: blob });
54+
```
55+
56+
Note that this means that blobs are _not_ atomic or [ACID](https://en.wikipedia.org/wiki/ACID) compliant; streaming functionality achieves the opposite behavior of ACID/atomic writes that would prevent access to data as it is being written.
57+
58+
### Error Handling
59+
Because blobs can be streamed and referenced prior to their completion, there is a chance that an error or interruption could occur while streaming data to the blob (after the record is committed). We can create an error handler for the blob to handle the case of an interrupted blob:
60+
61+
```javascript
62+
export class MyEndpoint extends MyTable {
63+
let blob = this.data;
64+
blob.on('error', () => {
65+
// if this was a caching table, we may want to invalidate or delete this record:
66+
this.invalidate();
67+
});
68+
async get() {
69+
return {
70+
status: 200,
71+
headers: {},
72+
body: blob
73+
});
74+
}
75+
}
76+
```
77+
78+
See the [configuration](../../deployments/configuration.md) documentation for more information on configuring where blob are stored.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# 4.5.0
2+
3+
#### HarperDB 4.5.0
4+
5+
2/?/2025
6+
7+
### Blob Storage
8+
4.5 introduces a new [Blob storage system](../../reference/blob.md), that is designed to efficiently handle large binary objects, with built-in support for streaming large content/media in and out of storage. This provides significantly better performance and functionality for large unstructured data, such as HTML, images, video, and other large files. Components can leverage this functionality through the JavaScript `Blob` interface, and the new `createBlob` function. Blobs are fully replicated and integrated.
9+
10+
### Password Hashing Upgrade
11+
4.5 adds two new password hashing algorithms for better security (to replace md5):
12+
`sha256`: This is a solid general purpose of password hashing, with good security properties and excellent performance. This is the default algorithm in 4.5.
13+
`argon2id`: This provides the highest level of security, and is the recommended algorithm that do not require frequent password verifications. However, it is more CPU intensive, and may not be suitable for environments with a high frequency of password verifications.
14+
15+
### Resource and Storage Analytics
16+
4.5 includes numerous new analytics for resources and storage, including page faults, context switches, free space, disk usage, and other metrics.
17+
18+
#### Default Replication Port
19+
The default port for replication has been changed from 9925 to 9933.
20+
21+
### Property Forwarding
22+
Accessing record properties from resource instances should be accessible through standard property access syntax, regardless of whether the property was declared in a schema. Previously only properties declared in a schema were accessible through standard property access syntax. This change allows for more consistent and intuitive access to record properties, regardless of how they were defined. It is still recommended to declare properties in a schema for better performance and documentation.
23+
24+
### Cluster Status Information
25+
The `cluster_status` operation now includes new statistics for replication, including the timestamps of last received transactions, sent transactions, and committed transactions.

0 commit comments

Comments
 (0)