Skip to content

The discrete feature values in the gcformat sample data generated by the OpenMLDB SQL feature extraction script are inconsistent with those calculated by the PICO script #3923

@yht520100

Description

@yht520100

Bug Description
Service Version: 0.9.0
The discrete feature values in the gcformat sample data generated by the OpenMLDB SQL feature extraction script are inconsistent with those calculated by the PICO script.

Expected Behavior
Current incorrect format: label| slot:sign:origin-value
Correct format: label index| slot:sign:origin-value

Relation Case
OpenMLDB SQL Feature Extraction Example:

0| 1:0:1 2:4599670039981440374 3:6365000770384461703 4:0:93.200000
1| 1:0:2 2:5613161932270271752 3:-1384602352766124944 4:0:93.075000
0| 1:0:3 2:4599670039981440374 3:-6239076729344379818 4:0:92.893000

PICO Feature Extraction Example:

0 0| 2:-8773247204422130117:1 3:4042412524814531440 4:6048373541161169225 5:4681710344575317709:0x1.74ccccccccccdp6
1 1| 2:-8773247204422130117:2 3:6142047291687075953 4:1461111459061395210 5:4681710344575317709:0x1.744cccccccccdp6
0 2| 2:-8773247204422130117:3 3:4042412524814531440 4:3353218529862650678 5:4681710344575317709:0x1.73926e978d4fep6

Steps to Reproduce

  1. data schema:
id[Int],age[Int],job[String],cons_price_idx[Double],y[Int]
  1. PICO Feature Extraction Script:
target_y = binary_label(y)
f_id = continuous(id)
f_age = discrete(age)
f_job = discrete(job)
f_cons_price_idx = continuous(cons_price_idx)
  1. OpenMLDB SQL Feature Extraction Script:
select gcformat(
       binary_label(bool(y)),
       continuous(id),
       discrete(age),
       discrete(job),
       continuous(cons_price_idx)
) as instance from main_table

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions