Skip to content

Conversation

@skhrg
Copy link
Member

@skhrg skhrg commented Dec 6, 2025

Closes #1483

@skhrg skhrg requested a review from mhasself December 6, 2025 16:05
@mhasself
Copy link
Member

mhasself commented Dec 8, 2025

Thank you! This looks good I just want to consider the interface for a few minutes. Specifically the verbs create commit_ make_ create_ ...

Can we just add a commit=True option to create_job?

And the check of existing should probably occur on commit rather than creation.

@skhrg
Copy link
Member Author

skhrg commented Dec 18, 2025

Can we just add a commit=True option to create_job?

Sure, so we just have create_job and commit_jobs?

@mmccrackan
Copy link
Contributor

Was borrowing some things from this for #1348 since trying to add all the jobs for each obs, wafer, and band up front took a while. I was testing some different approaches and was wondering if we could do this:

with self.session_scope() as session:
        session.add_all(jobs)
        session.flush()
        for job in jobs:
            job.id
            session.expunge(job)
        session.commit()

instead of:

with self.session_scope() as session:
      for job in jobs:
          session.add(job)
      session.commit()
      for job in jobs:
          job.id
          session.expunge(job)

The first one is significantly faster than the second one. Can do the check for existing before the with self.session_scope() as session line in another loop.

@skhrg
Copy link
Member Author

skhrg commented Dec 22, 2025

I like that a lot, let me run some tests and then update the PR.

@skhrg
Copy link
Member Author

skhrg commented Dec 23, 2025

Hmm having the commit bellow expunge seems to cause errors:

  job = session.get(Job, job_id)                                                                                                                                                                                                                               
Traceback (most recent call last):                                                                                                                                                                                                                             
  File "/global/common/software/sobs/perlmutter/conda_envs/soconda_20251027_0.2.2/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 3291, in expunge                                                                                               
    state = attributes.instance_state(instance)                                                                                                                                                                                                                
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                
AttributeError: 'NoneType' object has no attribute '_sa_instance_state'                                                                                                                                                                                        
                                                                                                                                                                                                                                                               
The above exception was the direct cause of the following exception:                                                                                                                                                                                           
                                                                                                                                                                                                                                                               
Traceback (most recent call last):                                                                                                                                                                                                                             
  File "/global/u1/s/skh/code/LATR_Validation/pipelines/lat_beams/make_source_map.py", line 424, in <module>                                                                                                                                                   
    job = jdb.lock(j.id)                                                                                                                                                                                                                                       
          ^^^^^^^^^^^^^^                                                                                                                                                                                                                                       
  File "/global/homes/s/skh/.local/soconda/20251027_0.2.2/lib/python3.12/site-packages/sotodlib/site_pipeline/jobdb.py", line 323, in lock                                                                                                                     
    session.expunge(job)                                                                                                                                                                                                                                       
  File "/global/common/software/sobs/perlmutter/conda_envs/soconda_20251027_0.2.2/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 3293, in expunge                                                                                               
    raise exc.UnmappedInstanceError(instance) from err                                                                                                                                                                                                         
sqlalchemy.orm.exc.UnmappedInstanceError: Class 'builtins.NoneType' is not mapped    

Also just watching the file seems like things arent really writing, I'll dig into this a bit more.

@mmccrackan
Copy link
Contributor

Hmm interesting. This is the full version of your commit function that I edited that seems to be working for me just in case it helps (the existence check may need improvement). I load back the jobdb made using this and it looks fully populated.

 def commit_jobs(self, jobs):
        """Commit jobs to the jobs table"""
        existing_jobs = self.get_jobs(jstate=JState.all(), locked=None)

        job_keys = {
            (job.jclass, frozenset(job.tags.items())) for job in jobs
        }

        for existing_job in existing_jobs:
            key = (existing_job.jclass, frozenset(existing_job.tags.items()))

            if key in job_keys:
                raise JobNotUniqueError(
                    f"Found other records with same tags: {existing_job.id}"
                )

        with self.session_scope() as session:
            session.add_all(jobs)
            session.flush()
            for job in jobs:
                job.id
                session.expunge(job)
            session.commit()

@skhrg
Copy link
Member Author

skhrg commented Dec 23, 2025

It was the missing flush. I think this is good to go pending where we want the existence check to go. I'm happy to move it to commit_jobs if thats what people think is best (in my use patterns I disable the check anyways).

@mmccrackan
Copy link
Contributor

On the existence check, I found that calling get_jobs in a loop for large numbers of jobs (>50,000 for an ISO-like preprocessing job over obs, wafers, and bands) is quite a bit slower than extracting all of the tags and jclasses at once and comparing those. Though in that case I think I can also just disable the check like you do since that is done up-front anyway.

@skhrg
Copy link
Member Author

skhrg commented Dec 23, 2025

Yeah I found the same thing, in my usecase I produce a dict of jobs upfront and use that to check existence in the loop (https://github.com/simonsobs/LATR_Validation/blob/main/pipelines/lat_beams/lat_beams/utils.py#L187). The problem I have with frozenset(job.tags.items()) as the key is that I don't always want all the tags to be used as the identifier (for example I store a serialized copy of my configuration and if I run with a new config I want to update the existing job not make a new one).

Some solutions to that are:

  1. Allow some more generic function to make the keys like I do in my linked function above. I think this may make the jobdb class more complex then it needs to be since making this generic enough for all use patterns may be tough.
  2. Pass in a list of tags to use as the keys when checking for job existence. This probably covers most people's use cases and can always be turned off for those who need something more advanced. Downside here is that its something that sort of trivially has unhandled edge cases, but maybe that's fine.
  3. Leave it slow with the assumption that advanced users doing batch adds will know to do a faster check ahead of time and people doing one offs don't care.
  4. Remove the existence check (or turn it off by default). In this case if the job already exists would it just add an extra job? I think that's fine honestly.

I personally think 3 or 4 is the way to go for now, with 1 or 2 as a possibility for a future PR depending on how the user base expands. If we go with 3 or 4 then we should probably drop an example of checking existence ahead of time in the docs.

@mmccrackan
Copy link
Contributor

Yeah I also think 3 sounds like the best solution probably.

@skhrg skhrg requested a review from mmccrackan December 23, 2025 19:15
@skhrg
Copy link
Member Author

skhrg commented Dec 23, 2025

Great. @mhasself thoughts before this is ready to merge?

Copy link
Contributor

@mmccrackan mmccrackan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in #1348 and it works great and has really helped speed that up, but will leave the final say to @mhasself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

jobdb: Add option to not write to database when running create_job

4 participants