Skip to content

Conversation

nikhilsinhaparseable
Copy link
Contributor

get the level of hierarchy from the json
perform generic flattening only if level of nesting is <=4

@coveralls
Copy link

coveralls commented Dec 30, 2024

Pull Request Test Coverage Report for Build 12618356442

Details

  • 109 of 255 (42.75%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 12.78%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/handlers/http/modal/utils/ingest_utils.rs 0 146 0.0%
Totals Coverage Status
Change from base Build 12596037238: 0.3%
Covered Lines: 2394
Relevant Lines: 18732

💛 - Coveralls

/// 5. `{"a":{"b":{"c":{"d":{"e":["a","b"]}}}}}` ~> returns error - heavily nested, cannot flatten this JSON
pub fn flatten_json(value: &Value) -> Result<Vec<Value>, anyhow::Error> {
if has_more_than_four_levels(value, 1) {
return Err(anyhow!("heavily nested, cannot flatten this JSON"));
Copy link
Contributor

@de-sh de-sh Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why error out? Why not just not flatten upto a certain extent and keep the type as is afterwards?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. {"a": [{"b":{"c":{"d":["e","f"]}}}, "c": {}]} ~> [{"a":{"b":{"c":{"d":["e","f"]}}}}, {"a":{"c": {}}}]

Comment on lines +76 to +100
let size: usize = body.len();
let mut parsed_timestamp = Utc::now().naive_utc();
if time_partition.is_none() {
if custom_partition.is_none() {
let size = size as u64;
create_process_record_batch(
stream_name,
req,
body_val,
static_schema_flag.as_ref(),
None,
parsed_timestamp,
&HashMap::new(),
size,
schema_version,
)
.await?;
} else {
let data = convert_array_to_object(
body_val,
None,
None,
custom_partition.as_ref(),
schema_version,
)?;
Copy link
Contributor

@de-sh de-sh Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the following fix the issue of not wanting to convert array to object?

    let data = if time_partition.is_some() || custom_partition.is_some() {
        convert_array_to_object(
            body_val,
            time_partition.as_ref(),
            time_partition_limit,
            custom_partition.as_ref(),
            schema_version,
        )?
    }  else {
        vec![body_val]
    };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to test out all the scenarios before I can confirm the changes, as discussed, lets take the optimisation of this one as separate issue.

@nitisht nitisht merged commit 450bac2 into parseablehq:main Jan 5, 2025
9 checks passed
parmesant pushed a commit to parmesant/parseable that referenced this pull request Jan 13, 2025
hierarchical json flattening restriction
get the level of hierarchy from the json
perform generic flattening only if level of nesting is <=4

---------

Co-authored-by: Devdutt Shenoi <[email protected]>
@nikhilsinhaparseable nikhilsinhaparseable deleted the flattening-validation branch July 12, 2025 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants