-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Description
We use httpfs to query from s3.
Because there isn't http connection pool in httplib,there are many TIME_WAIT tcp after query. link #82
We are looking forward to use curllib to reslove the problem.
But it don't work now.
Envirement
Duckdb: 1.4.0
httpfs: 1.4.0
Init
INSTALL httpfs;
LOAD httpfs;
CREATE SECRET s3 (TYPE S3,ENDPOINT 'obs.cn-east-3.myhuaweicloud.com',URL_STYLE 'vhost',KEY_ID 'xxx',SECRET 'xxx',REGION 'cn-east-3');Query
select o.count c_0, o.transferCount c_2 from (
SELECT
SUM (1) count,SUM (case when obj.workflowTransferId = 918851006576619520 then 1 else 0 end) transferCount
FROM
(
SELECT
frameId,
UNNEST ( json_extract ( labelObjects, '$.*' ) ) AS obj
FROM
read_json_auto ( 's3://seedpro-pref/LABEL_RESULT/ALL_3D/898323320727146496/898330725189947392/898336779609845760/898336781522186240/0/*.json.zst', compression = 'zstd', columns = {
'taskId':'BIGINT',
'frameId':'BIGINT',
'labelToolId':'BIGINT',
'labelObjects':'JSON',
'attrScope':'VARCHAR'} )
WHERE
taskId = 898336779609845760
)
WHERE
( json_extract_string ( obj, '$.objectType' ) != 'AI' OR obj.aiApplied = TRUE )
AND ( obj.unSegmentation IS NULL OR obj.unSegmentation = FALSE )
AND json_extract_string ( obj, '$.toolType' ) != 'DASHED_CURVE' AND json_extract_string ( obj, '$.attrScope' ) = 'LOCATE'
) o;Run
default/httplib
setting:
SET httpfs_client_implementation='default';
SET httpfs_client_implementation='httplib';log:
100% ▕██████████████████████████████████████▏ (00:00:04.32 elapsed)
┌────────┬────────┐
│ c_0 │ c_2 │
│ int128 │ int128 │
├────────┼────────┤
│ 49 │ 7 │
└────────┴────────┘
curl
setting:
SET httpfs_client_implementation='curl';log:
IO Error:
URL using bad/illegal format or missing URL error for HTTP GET to '/?encoding-type=url&list-type=2&prefix=LABEL_RESULT%2FALL_3D%2F898323320727146496%2F898330725189947392%2F898336779609845760%2F898336781522186240%2F0%2F'
Metadata
Metadata
Assignees
Labels
No labels