Skip to content

Conversation

@lumburovskalina
Copy link
Collaborator

@lumburovskalina lumburovskalina commented Oct 28, 2025

Comment on lines +357 to +375
embedding_agg AS (
SELECT
e.project_id,
(
SELECT jsonb_object_agg(
ebs.state,
jsonb_build_object(
'count', ebs.count,
'embeddings', ebs.embeddings
)
)
FROM embeddings_by_state ebs
WHERE ebs.project_id = e.project_id
) AS embeddings_by_state
FROM embedding e
LEFT JOIN project p ON p.id = e.project_id
GROUP BY e.project_id
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to join with project?
Also if we just want the object aggregated embeddings this could be one query instead of the embeddings_by_state & embedding_agg by wrapping it in another select

Comment on lines +398 to +417
attribute_agg AS (
SELECT
a.project_id,
(
SELECT jsonb_object_agg(
abs.state,
jsonb_build_object(
'count', abs.count,
'attributes', abs.attributes
)
)
FROM attribute_by_state abs
WHERE abs.project_id = a.project_id
) AS attributes_by_state
FROM attribute a
LEFT JOIN project p ON p.id = a.project_id
WHERE a.state NOT IN ('UPLOADED','AUTOMATICALLY_CREATED')
GROUP BY a.project_id
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Comment on lines +438 to +456
record_tokenization_task_agg AS (
SELECT
rtt.project_id,
(
SELECT jsonb_object_agg(
rtts.state,
jsonb_build_object(
'count', rtts.count,
'record_tokenization_tasks', rtts.record_tokenization_tasks
)
)
FROM record_tokenization_tasks_by_state rtts
WHERE rtts.project_id = rtt.project_id
) AS record_tokenization_tasks_by_state
FROM record_tokenization_task rtt
LEFT JOIN project p ON p.id = rtt.project_id
GROUP BY rtt.project_id
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Comment on lines +476 to +478
LEFT JOIN embedding_agg ea ON ea.project_id = i.project_id
LEFT JOIN attribute_agg aa ON aa.project_id = i.project_id
LEFT JOIN record_tokenization_task_agg rtt ON rtt.project_id = i.project_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually write the on in the next line so e.g.
LEFT JOIN embedding_agg ea
ON ea.project_id = i.project_id

p.name AS project_name
FROM organization o
JOIN integration_data i ON i.organization_id = o.id
LEFT JOIN project p ON p.id = i.project_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik doesn't need to be a left join since prject should always exist for an integration

Also we could build this directly in the previousintegration data by joining there with org & project

JOIN organization o ON o.id = md.organization_id
WHERE
mf.created_at >= '{created_at_from}'
AND mf.state IN ({', '.join(f"'{state}'" for state in states)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could potentially fail if states is empty list so maybe an early return for this since without any state it would be empty anyway

created_at_to: Optional[str] = None,
) -> List[Any]:

states = prevent_sql_injection(states, isinstance(states, list))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik we need to prevent every string with a list comprehension not the whole list



def get_last_chat_messages(
message_type: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could give it the type in the request body directly so fastapi parses it and we can compare without .value

created_at_to: Optional[str] = None,
) -> List[Any]:

step_types = prevent_sql_injection(step_types, isinstance(step_types, list))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

if created_at_to:
created_at_to_filter = f"AND ss.created_at <= '{created_at_to}'"

step_types_sql = ", ".join([f"'{st}'" for st in step_types])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be faulty for empty lists so early return recommended

Comment on lines +154 to +158
AND EXISTS (
SELECT 1
FROM jsonb_array_elements(template_config->'steps') t
WHERE t->>'stepType' IN ({step_types_sql})
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exists filter are usually slow. and they are run multiple times in the query. maybe we can preselect the step type (single value) and the template step types (array) in the qith query to then in a sourrounding query filter on whats needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants