Skip to content

Commit 7d47287

Browse files
authored
Wrote about blob-curl (#23)
Signed-off-by: David Söderlund <[email protected]>
1 parent 8b0c5f3 commit 7d47287

File tree

1 file changed

+184
-0
lines changed

1 file changed

+184
-0
lines changed
Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
---
2+
title: Azure blob storage from curl
3+
published: true
4+
---
5+
6+
# Upload large files from tiny devices to Azure blob storage
7+
8+
This week I got to work on something a bit different from my normal wheel house.
9+
10+
I published [my source code here](https://github.com/QuadmanSWE/curl-blob) including examples for different auth schemes and different mechanisms of invokation
11+
12+
## Scope
13+
14+
The customer needed some fast help in solving how to upload files mounted on a device with limited RAM and very little maneuverability when it comes to adding software.
15+
16+
The limitations on the hardware specs posed the question:
17+
18+
Could we do it with just curl?
19+
20+
We briefly looked at something like scp but couldn't in time produce a binary that fit the CPU architecture.
21+
22+
So we ran with what we had: dd, curl and parts of openssl for cryptographic signing.
23+
24+
## Making plans
25+
26+
First I drew up how I wanted to interact with the software and what I know you need to interact with the Azure Blob Storage API.
27+
28+
- Accountname
29+
- Credentials
30+
- Container name
31+
- Blob patch
32+
- Local path
33+
34+
I knew I wanted to be able to run it in a docker container for testing to make sure I wasn't tricked by running on windows or wsl.
35+
36+
Quick and dirty, get alpine, add openssl and curl, call a shell script to make it happen once you mount the file you want to upload, let's go.
37+
38+
``` Dockerfile
39+
FROM alpine
40+
RUN apk add --no-cache curl openssl
41+
COPY upload.sh /upload.sh
42+
ENTRYPOINT ["/bin/sh", "/upload.sh"]
43+
```
44+
45+
## First prototype
46+
47+
Github Copilot was friendly enough to get me started on a shell script that could perform the steps. But it was a tripped up by different versions in the API docs so I had to make some corrections.
48+
49+
[Documentation on rest PUT blob from Microsoft](https://learn.microsoft.com/en-us/rest/api/storageservices/put-blob)
50+
51+
52+
Here is our first attempt to get a file to a blob storage container in one pass.
53+
54+
``` sh
55+
#!/bin/sh
56+
# upload.sh
57+
set -e
58+
# Arguments
59+
STORAGE_ACCOUNT_NAME=${STORAGE_ACCOUNT_NAME:-$1}
60+
STORAGE_ACCOUNT_KEY=${STORAGE_ACCOUNT_KEY:-$2}
61+
STORAGE_CONTAINER=${STORAGE_CONTAINER:-$3}
62+
BLOB_PATH=${BLOB_PATH:-$4}
63+
FILE_PATH=${FILE_PATH:-$5}
64+
65+
BLOB_LENGTH=$(wc -c <$FILE_PATH)
66+
BLOB_TYPE="BlockBlob"
67+
68+
# Construct the URL
69+
URL="https://${STORAGE_ACCOUNT_NAME}.blob.core.windows.net/${STORAGE_CONTAINER}/${BLOB_PATH}"
70+
# Generate the headers
71+
DATE_VALUE=$(date -u +"%a, %d %b %Y %H:%M:%S GMT")
72+
STORAGE_SERVICE_VERSION="2019-12-12"
73+
# Construct the CanonicalizedResource
74+
CANONICALIZED_RESOURCE="/${STORAGE_ACCOUNT_NAME}/${STORAGE_CONTAINER}/${BLOB_PATH}"
75+
# Construct the CanonicalizedHeaders
76+
CANONICALIZED_HEADERS="x-ms-blob-type:${BLOB_TYPE}\nx-ms-date:${DATE_VALUE}\nx-ms-version:${STORAGE_SERVICE_VERSION}"
77+
# Generate the signature
78+
STRING_TO_SIGN="PUT\n\n\n${BLOB_LENGTH}\n\n\n\n\n\n\n\n\n${CANONICALIZED_HEADERS}\n${CANONICALIZED_RESOURCE}"
79+
SIGNATURE=$(printf "$STRING_TO_SIGN" | openssl dgst -sha256 -mac HMAC -macopt "hexkey:$decoded_hex_key" -binary | base64 -w0)
80+
AUTHORIZATION_HEADER="SharedKey ${STORAGE_ACCOUNT_NAME}:${SIGNATURE}"
81+
# Upload the file
82+
curl -X PUT -T "${FILE_PATH}" -H "x-ms-blob-type: ${BLOB_TYPE}" -H "x-ms-date: ${DATE_VALUE}" -H "x-ms-version: ${STORAGE_SERVICE_VERSION}" -H "Authorization: ${AUTHORIZATION_HEADER}" ${URL}
83+
# Terminate
84+
exit 0
85+
```
86+
87+
This worked well, the construction of the arguments were a bit finicky but we got there pretty quick. The error messages from the api were very helpful most of the time.
88+
89+
The authentication mechanism is that you sign the request (length, headers, resource) that you are using such that you prove that you have access to the private key to the storage account and that you want to perform the exact operation matching the signature. The API will do the same operation and compare the signatures.
90+
91+
## What if the files don't fit in RAM?
92+
93+
Right, hardware specs.
94+
95+
We came up with the idea of striping the files and uploading them one by one, but quickly found that Azure supports incremental uploads to a Append Blob.
96+
97+
Using dd to get the exact right chunk of data and piping it into curl.
98+
99+
Here was our resulting scheme, note that we need to sign each request to upload another block / chunk.
100+
101+
``` sh
102+
##### removed for brevity
103+
CHUNK_SIZE=${CHUNK_SIZE:-$6}
104+
# Figures out if CHUNK_SIZE is null in which case we always to a single blob upload
105+
if [ -z "$CHUNK_SIZE" ]; then
106+
CHUNK_SIZE=$BLOB_LENGTH
107+
fi
108+
109+
if [ $BLOB_LENGTH -le $CHUNK_SIZE ]; then
110+
##### Previous example, removed for brevity
111+
exit 0
112+
else
113+
# empty append blob
114+
CONTENT_TYPE="application/octet-stream"
115+
BLOB_TYPE="AppendBlob"
116+
CANONICALIZED_HEADERS="x-ms-blob-type:${BLOB_TYPE}\nx-ms-date:${DATE_VALUE}\nx-ms-version:${STORAGE_SERVICE_VERSION}"
117+
STRING_TO_SIGN="PUT\n\n\n\n\n${CONTENT_TYPE}\n\n\n\n\n\n\n${CANONICALIZED_HEADERS}\n${CANONICALIZED_RESOURCE}"
118+
SIGNATURE=$(printf "$STRING_TO_SIGN" | openssl dgst -sha256 -mac HMAC -macopt "hexkey:$decoded_hex_key" -binary | base64 -w0)
119+
AUTHORIZATION_HEADER="SharedKey ${STORAGE_ACCOUNT_NAME}:${SIGNATURE}"
120+
# Create an empty append blob
121+
curl -X PUT -H "Content-Type: ${CONTENT_TYPE}" -H "Content-Length: 0" -H "x-ms-blob-type: ${BLOB_TYPE}" -H "x-ms-date: ${DATE_VALUE}" -H "x-ms-version: ${STORAGE_SERVICE_VERSION}" -H "Authorization: ${AUTHORIZATION_HEADER}" ${URL}
122+
end
123+
124+
# Upload the file in chunks
125+
OFFSET=0
126+
CHUNK_NUMBER=0
127+
URL="${URL}?comp=appendblock"
128+
129+
while [ $(($OFFSET + $CHUNK_SIZE)) -le $BLOB_LENGTH ]; do
130+
CANONICALIZED_HEADERS="x-ms-blob-condition-appendpos:${OFFSET}\nx-ms-blob-condition-maxsize:${BLOB_LENGTH}\nx-ms-date:${DATE_VALUE}\nx-ms-version:${STORAGE_SERVICE_VERSION}"
131+
STRING_TO_SIGN="PUT\n\n\n${CHUNK_SIZE}\n\n${CONTENT_TYPE}\n\n\n\n\n\n\n${CANONICALIZED_HEADERS}\n${CANONICALIZED_RESOURCE}\ncomp:appendblock"
132+
SIGNATURE=$(printf "$STRING_TO_SIGN" | openssl dgst -sha256 -mac HMAC -macopt "hexkey:$decoded_hex_key" -binary | base64 -w0)
133+
AUTHORIZATION_HEADER="SharedKey ${STORAGE_ACCOUNT_NAME}:${SIGNATURE}"
134+
dd if=$FILE_PATH bs=$CHUNK_SIZE count=1 skip=$CHUNK_NUMBER 2>/dev/null |
135+
curl -m 2 -X PUT --data-binary @- \
136+
-H "Content-Type: ${CONTENT_TYPE}" \
137+
-H "Content-Length: $CHUNK_SIZE" \
138+
-H "x-ms-blob-condition-maxsize: ${BLOB_LENGTH}" \
139+
-H "x-ms-blob-condition-appendpos: ${OFFSET}" \
140+
-H "x-ms-date: ${DATE_VALUE}" \
141+
-H "x-ms-version: ${STORAGE_SERVICE_VERSION}" \
142+
-H "Authorization: ${AUTHORIZATION_HEADER}" \
143+
"${URL}"
144+
OFFSET=$(($OFFSET + $CHUNK_SIZE))
145+
CHUNK_NUMBER=$(($CHUNK_NUMBER + 1))
146+
done
147+
# ...
148+
```
149+
150+
This works super well as long as the chunks align with the entire blob size.
151+
More often than not of course we will find that it doesn't, so we just calculate the last chunk size with modulo.
152+
153+
``` sh
154+
# ... continuing on
155+
LAST_CHUNK_SIZE=$(($BLOB_LENGTH % $CHUNK_SIZE))
156+
if [ $LAST_CHUNK_SIZE -gt 0 ]; then
157+
CANONICALIZED_HEADERS="x-ms-blob-condition-appendpos:${OFFSET}\nx-ms-blob-condition-maxsize:${BLOB_LENGTH}\nx-ms-date:${DATE_VALUE}\nx-ms-version:${STORAGE_SERVICE_VERSION}"
158+
STRING_TO_SIGN="PUT\n\n\n${LAST_CHUNK_SIZE}\n\n${CONTENT_TYPE}\n\n\n\n\n\n\n${CANONICALIZED_HEADERS}\n${CANONICALIZED_RESOURCE}\ncomp:appendblock"
159+
SIGNATURE=$(printf "$STRING_TO_SIGN" | openssl dgst -sha256 -mac HMAC -macopt "hexkey:$decoded_hex_key" -binary | base64 -w0)
160+
AUTHORIZATION_HEADER="SharedKey ${STORAGE_ACCOUNT_NAME}:${SIGNATURE}"
161+
dd if=$FILE_PATH bs=$CHUNK_SIZE count=1 skip=$CHUNK_NUMBER 2>/dev/null |
162+
curl -m 2 -X PUT --data-binary @- \
163+
-H "Content-Type: ${CONTENT_TYPE}" \
164+
-H "Content-Length: $LAST_CHUNK_SIZE" \
165+
-H "x-ms-blob-condition-maxsize: ${BLOB_LENGTH}" \
166+
-H "x-ms-blob-condition-appendpos: ${OFFSET}" \
167+
-H "x-ms-date: ${DATE_VALUE}" \
168+
-H "x-ms-version: ${STORAGE_SERVICE_VERSION}" \
169+
-H "Authorization: ${AUTHORIZATION_HEADER}" \
170+
"${URL}"
171+
fi
172+
```
173+
174+
## Distributing signing keys to an entire storage account might not scale well with thousands of devices.
175+
176+
The central solution that the devices help out has a bit more power and a lot more flexibilty in adding new software.
177+
178+
By generating SAS tokens for the places where each device will push their blobs, we can tailor and distribute those tokens rather easily.
179+
180+
It simplifies the authentication process greatly too.
181+
182+
Examples of using this auth mechanism can be found in the github repo linked at the top.
183+
184+
Cheers.

0 commit comments

Comments
 (0)