Skip to content
This repository was archived by the owner on Oct 27, 2022. It is now read-only.

Commit b8d3c26

Browse files
authored
Merge pull request #1075 from dmann/DDS-974-enforce_upload_chunk_size
DDS-974 enforce upload chunk size
2 parents 8252bb4 + 00297af commit b8d3c26

23 files changed

+532
-47
lines changed
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
# DDS-974 Enforce Upload Chunk Size
2+
3+
## Deployment View
4+
5+
**NOTE** The following must be done **before** the circle build
6+
7+
In order for rake db:data:migrate to work, the following ENV must be set in all heroku applications:
8+
- heroku config:set SWIFT_CHUNK_MAX_NUMBER=1000
9+
- heroku config:set SWIFT_CHUNK_MAX_SIZE_BYTES=5368709122
10+
11+
We must back up the Postgresql database! This is because we had to change the
12+
size fields in uploads/chunks from int to bigint to allow for large files.
13+
14+
## Logical View
15+
16+
A critical issue was recently exposed with the chunked upload capability. The storage provider (Swift), is configured with a set maximum number of segments (i.e. Data Service chunks), beyond which it cannot coalesce a large file. To remedy this situation, the API will be extended to enforce a minumum chunk size, based on the overall file size and max segments setting.
17+
18+
#### Summary of impacted APIs
19+
20+
|Endpoint |Description |
21+
|---|---|
22+
| `GET /storage_providers` | List supported storage providers. |
23+
| `GET /storage_providers/{id}` | Get a storage provider. |
24+
| `POST /projects/{id}/uploads` | Inititate a chunked upload. |
25+
| `PUT /uploads/{id}/chunks` | Generate and return a pre-signed upload URL for a chunk. |
26+
27+
#### API Specification
28+
This section defines the proposed API interface extensions.
29+
30+
##### List supported storage providers / Get a storage provider
31+
`GET /storage_providers` / `GET /storage_providers/{id}`
32+
33+
###### Response Example (Extensions)
34+
The following properties will be added to the storage providers resource:
35+
36+
+ **chunk\_max\_size\_bytes** - The maximum size of a chunk that can be upload.
37+
+ **chunk\_max\_number** - The maximum number of chunks that can be uploaded for a single file.
38+
+ **file\_max\_size\_bytes** - Maximum supported file size that can be uploaded. (`chunk_max_size_bytes * chunk_max_number`)
39+
40+
```
41+
{
42+
"id": "g5579f73-0558-4f96-afc7-9d251e65bv33",
43+
"name": "duke_oit_swift",
44+
"description": "Duke OIT Storage",
45+
"chunk_hash_algorithm": "md5",
46+
"chunk_max_size_bytes": 5368709120,
47+
"chunk_max_number": 1000,
48+
"file_max_size_bytes": 5497558138880,
49+
"is_deprecated": false
50+
}
51+
```
52+
53+
##### Intitate a chunked upload
54+
`POST /projects/{id}/uploads`
55+
56+
###### Response Headers (Extensions)
57+
The following custom response headers will be added to inform clients of the minimum chunk size that may be utlized to ensure chunks can be coalesced, as well as the maximum chunk size the storage provider can accommodate.
58+
59+
+ **X-MIN-CHUNK-UPLOAD-SIZE** - The minimum chunk size in bytes.
60+
+ **X-MAX-CHUNK-UPLOAD-SIZE** - The maximum chunk size in bytes.
61+
62+
###### Response Messages (Extensions)
63+
+ 400 - File size is currently not supported - maximum size is {max_segments * max_chunk_upload_size}
64+
65+
###### Response Example
66+
```
67+
{
68+
error: '400',
69+
code: "not_provided",
70+
reason: 'validation failed',
71+
suggestion: 'Fix the following invalid fields and resubmit',
72+
errors:
73+
[
74+
{
75+
"size": "File size is currently not supported - maximum size is {max_segments * max_chunk_upload_size}"
76+
}
77+
]
78+
}
79+
```
80+
81+
##### Generate and return a pre-signed upload URL for a chunk
82+
`PUT /uploads/{id}/chunks`
83+
84+
###### Response Messages (Extensions)
85+
+ 400 - Invalid chunk size specified - must be in range {min}-{max}
86+
+ 400 - Upload chunks exceeded, must be less than {max}
87+
88+
###### Response Example
89+
```
90+
{
91+
error: '400',
92+
code: "not_provided",
93+
reason: 'validation failed',
94+
suggestion: 'Fix the following invalid fields and resubmit',
95+
errors:
96+
[
97+
{
98+
"size": "Invalid chunk size specified - must be in range {min}-{max}"
99+
}
100+
]
101+
}
102+
```
103+
or
104+
```
105+
{
106+
error: '400',
107+
code: "not_provided",
108+
reason: 'maximum upload chunks exceeded.',
109+
suggestion: ''
110+
}
111+
```
112+
113+
## Implementation View
114+
115+
+ The offcial GCB python client and DDS Web portal client will need to be modifed to interface with these chunked upload API extensions.
116+
117+
+ The Swift `max_manifest_segements` will be set to 2000 and all uploads that are inconsistent due to exceeding the prior setting of 1000, will be re-queued for processing.
118+
119+
## Process View
120+
121+
Add notes about performance, scalability, throughput, etc. here. These can inform future proposals to change the implementation.
122+
123+
This design introduces a change the the error response for validation errors.
124+
Most validation_error responses will remain unchanged, reporting a list of field
125+
errors that must be addressed:
126+
```
127+
{
128+
error: '400',
129+
code: "not_provided",
130+
reason: 'validation failed',
131+
suggestion: 'Fix the following invalid fields and resubmit',
132+
errors:
133+
[
134+
{
135+
"field": "something is wrong with this"
136+
}
137+
]
138+
}
139+
```
140+
141+
Some validation errors happen for the entire object, and not for any specific
142+
field, such as when a user attempts to delete a property or template that is
143+
associated with an object, or create a chunk that exceeds the storage_provider
144+
maximum_chunk_number.
145+
146+
In the past, these errors would have come in the list of 'errors', labeled
147+
`base`:
148+
```
149+
{
150+
error: '400',
151+
code: "not_provided",
152+
reason: 'validation failed',
153+
suggestion: 'Fix the following invalid fields and resubmit',
154+
errors:
155+
[
156+
{
157+
"base": "something is wrong with this"
158+
}
159+
]
160+
}
161+
```
162+
163+
Going forward, these object errors will be placed into `reason`, and the response
164+
payload may or may not have other fields that are invalid as well. If there are
165+
no invalid fields, the suggestion will be a blank string, and there will not be
166+
an errors entry in the payload.
167+
```
168+
{
169+
error: '400',
170+
code: "not_provided",
171+
reason: 'something is wrong with this.',
172+
suggestion: ''
173+
}
174+
```

app/api/dds/v1/base.rb

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -104,17 +104,24 @@ def validation_error!(object)
104104
error: '400',
105105
code: "not_provided",
106106
reason: 'validation failed',
107-
suggestion: 'Fix the following invalid fields and resubmit',
108-
errors: []
107+
suggestion: ''
109108
}
110-
object.errors.messages.each do |field, errors|
109+
unless object.errors.messages[:base].empty?
110+
error_payload[:reason] = object.errors.messages[:base].join(' ')
111+
end
112+
field_errors = []
113+
object.errors.messages.reject{|field| field == :base }.each do |field, errors|
111114
errors.each do |message|
112-
error_payload[:errors] << {
115+
field_errors << {
113116
field: field,
114117
message: message
115118
}
116119
end
117120
end
121+
unless field_errors.empty?
122+
error_payload[:errors] = field_errors
123+
error_payload[:suggestion] = 'Fix the following invalid fields and resubmit'
124+
end
118125
error!(error_payload, 400)
119126
end
120127

app/api/dds/v1/uploads_api.rb

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@ class UploadsAPI < Grape::API
3737
})
3838
authorize upload, :create?
3939
if upload.save
40+
header 'X-MIN-CHUNK-UPLOAD-SIZE', upload.minimum_chunk_size
41+
header 'X-MAX-CHUNK-UPLOAD-SIZE', upload.max_size_bytes
4042
upload
4143
else
4244
validation_error!(upload)

app/models/chunk.rb

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,24 @@ class Chunk < ActiveRecord::Base
1111
has_many :project_permissions, through: :upload
1212

1313
validates :upload_id, presence: true
14-
validates :number, presence: true,
14+
validates :number, presence: true,
1515
uniqueness: {scope: [:upload_id], case_sensitive: false}
1616
validates :size, presence: true
17+
validates :size, numericality: {
18+
less_than: :chunk_max_size_bytes,
19+
greater_than_or_equal_to: :minimum_chunk_size,
20+
message: ->(object, data) do
21+
"Invalid chunk size specified - must be in range #{object.minimum_chunk_size}-#{object.chunk_max_size_bytes}"
22+
end
23+
}, if: :storage_provider
24+
1725
validates :fingerprint_value, presence: true
1826
validates :fingerprint_algorithm, presence: true
1927

20-
delegate :project_id, to: :upload
28+
validate :upload_chunk_maximum, if: :storage_provider
29+
30+
delegate :project_id, :minimum_chunk_size, to: :upload
31+
delegate :chunk_max_size_bytes, to: :storage_provider
2132

2233
def http_verb
2334
'PUT'
@@ -47,8 +58,18 @@ def url
4758
storage_provider.build_signed_url(http_verb, sub_path, expiry)
4859
end
4960

61+
def total_chunks
62+
upload.chunks.count
63+
end
64+
5065
private
5166

67+
def upload_chunk_maximum
68+
unless total_chunks < storage_provider.chunk_max_number
69+
errors[:base] << 'maximum upload chunks exceeded.'
70+
end
71+
end
72+
5273
def update_upload_etag
5374
last_audit = self.audits.last
5475
new_comment = last_audit.comment ? last_audit.comment.merge({raised_by_audit: last_audit.id}) : {raised_by_audit: last_audit.id}

app/models/storage_provider.rb

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@ class StorageProvider < ActiveRecord::Base
1010
validates :service_pass, presence: true
1111
validates :primary_key, presence: true
1212
validates :secondary_key, presence: true
13-
13+
validates :chunk_max_number, presence: true
14+
validates :chunk_max_size_bytes, presence: true
15+
1416
def auth_token
1517
call_auth_uri['x-auth-token']
1618
end

app/models/upload.rb

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,14 @@ class Upload < ActiveRecord::Base
1414

1515
validates :project_id, presence: true
1616
validates :name, presence: true
17-
validates :size, presence: true
1817
validates :storage_provider_id, presence: true
18+
validates :size, presence: true
19+
validates :size, numericality: {
20+
less_than: :max_size_bytes,
21+
message: ->(object, data) do
22+
"File size is currently not supported - maximum size is #{object.max_size_bytes}"
23+
end
24+
}, if: :storage_provider
1925
validates :creator_id, presence: true
2026
validates :completed_at, immutable: true, if: :completed_at_was
2127
validates :completed_at, immutable: true, if: :error_at_was
@@ -93,6 +99,14 @@ def create_and_validate_storage_manifest
9399
end
94100
end
95101

102+
def max_size_bytes
103+
storage_provider.chunk_max_number * storage_provider.chunk_max_size_bytes
104+
end
105+
106+
def minimum_chunk_size
107+
(size.to_f / storage_provider.chunk_max_number).ceil
108+
end
109+
96110
private
97111
def integrity_exception(message)
98112
exactly_now = DateTime.now

app/serializers/storage_provider_serializer.rb

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
class StorageProviderSerializer < ActiveModel::Serializer
2-
attributes :id, :name, :description, :is_deprecated, :chunk_hash_algorithm
2+
attributes :id, :name, :description, :is_deprecated, :chunk_hash_algorithm,
3+
:chunk_max_number, :chunk_max_size_bytes
34

45
def name
56
object.display_name
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
class AddChunkMaxNumberChunkMaxSizeBytesToStorageProviders < ActiveRecord::Migration[5.0]
2+
def change
3+
add_column :storage_providers, :chunk_max_number, :integer
4+
add_column :storage_providers, :chunk_max_size_bytes, :integer, limit: 8
5+
end
6+
end
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
class ChangeChunkSize < ActiveRecord::Migration[5.0]
2+
def change
3+
change_column :chunks, :size, :integer, limit: 8
4+
end
5+
end

db/schema.rb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
#
1111
# It's strongly recommended that you check this file into your version control system.
1212

13-
ActiveRecord::Schema.define(version: 20170622204323) do
13+
ActiveRecord::Schema.define(version: 20170912182841) do
1414

1515
# These are extensions that must be enabled in order to support this database
1616
enable_extension "plpgsql"
@@ -98,7 +98,7 @@
9898
create_table "chunks", id: :uuid, default: -> { "uuid_generate_v4()" }, force: :cascade do |t|
9999
t.uuid "upload_id"
100100
t.integer "number"
101-
t.integer "size"
101+
t.bigint "size"
102102
t.string "fingerprint_value"
103103
t.string "fingerprint_algorithm"
104104
t.datetime "created_at", null: false
@@ -264,6 +264,8 @@
264264
t.datetime "created_at", null: false
265265
t.datetime "updated_at", null: false
266266
t.string "chunk_hash_algorithm", default: "md5"
267+
t.integer "chunk_max_number"
268+
t.bigint "chunk_max_size_bytes"
267269
end
268270

269271
create_table "system_permissions", id: :uuid, default: -> { "uuid_generate_v4()" }, force: :cascade do |t|

0 commit comments

Comments
 (0)