Skip to content

Use curl to query from s3 error #127

@naah69

Description

@naah69

We use httpfs to query from s3.
Because there isn't http connection pool in httplib,there are many TIME_WAIT tcp after query. link #82
We are looking forward to use curllib to reslove the problem.
But it don't work now.

Envirement

Duckdb: 1.4.0
httpfs: 1.4.0

Init

INSTALL httpfs;
LOAD httpfs;
CREATE SECRET s3 (TYPE S3,ENDPOINT 'obs.cn-east-3.myhuaweicloud.com',URL_STYLE 'vhost',KEY_ID 'xxx',SECRET 'xxx',REGION 'cn-east-3');

Query

select o.count c_0, o.transferCount c_2 from (
  SELECT
   SUM (1) count,SUM (case when obj.workflowTransferId = 918851006576619520 then 1 else 0 end) transferCount
  FROM
  	(
  	SELECT
  		frameId,
  		UNNEST ( json_extract ( labelObjects, '$.*' ) ) AS obj
  	FROM
  		read_json_auto ( 's3://seedpro-pref/LABEL_RESULT/ALL_3D/898323320727146496/898330725189947392/898336779609845760/898336781522186240/0/*.json.zst', compression = 'zstd', columns = {
                                                                    'taskId':'BIGINT',
                                                                    'frameId':'BIGINT',
                                                                    'labelToolId':'BIGINT',
                                                                    'labelObjects':'JSON',
                                                                    'attrScope':'VARCHAR'} )
  	WHERE
  		taskId = 898336779609845760
  	)
  WHERE
  	( json_extract_string ( obj, '$.objectType' ) != 'AI' OR obj.aiApplied = TRUE )
  	AND ( obj.unSegmentation IS NULL OR obj.unSegmentation = FALSE )
  	AND json_extract_string ( obj, '$.toolType' ) != 'DASHED_CURVE' AND json_extract_string ( obj, '$.attrScope' ) = 'LOCATE'
  	) o;

Run

default/httplib

setting:

SET httpfs_client_implementation='default';
SET httpfs_client_implementation='httplib';

log:

100% ▕██████████████████████████████████████▏ (00:00:04.32 elapsed)     
┌────────┬────────┐
│  c_0   │  c_2   │
│ int128 │ int128 │
├────────┼────────┤
│   49   │   7    │
└────────┴────────┘

curl

setting:

SET httpfs_client_implementation='curl';

log:

IO Error:
URL using bad/illegal format or missing URL error for HTTP GET to '/?encoding-type=url&list-type=2&prefix=LABEL_RESULT%2FALL_3D%2F898323320727146496%2F898330725189947392%2F898336779609845760%2F898336781522186240%2F0%2F'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions