Skip to content

Conversation

Aarsh-Wankar
Copy link
Contributor

@Aarsh-Wankar Aarsh-Wankar commented Feb 13, 2025

Description

An extra if condition is added to the __call__ function of the Function class to check if when pytorch.function is called, the length of the input list is zero. In that case, trust_input variable is set to True, thus preventing the execution of a few loops, saving time.

Tests:

Test code:

import pytensor
import pytensor.tensor as pt
import timeit
fn = pytensor.function([], pt.zeros((5)))
print(timeit.timeit(fn, number=100000))
# time before adding condition:  2.59 µs ± 238 ns per loop (mean ± std. dev. of 10 runs, 100000 loops each)
# time after adding condition:  1.95 µs ± 143 ns per loop (mean ± std. dev. of 10 runs, 100000 loops each)
fn.trust_input=True
print(timeit.timeit(fn, number=100000))
# time before adding condition:  2.29 µs ± 304 ns per loop (mean ± std. dev. of 10 runs, 100000 loops each)
# time after adding condition:  2.04 µs ± 173 ns per loop (mean ± std. dev. of 10 runs, 100000 loops each)

Related Issue

Checklist

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pytensor--1206.org.readthedocs.build/en/1206/

Comment on lines 881 to 882
if len(args) == 0: # for speed we trust the input for empty args
trust_input = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done once in the __init__, not in every __call__

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. I'll change it. Also, yes, I'll take on #1116 as well. Can I make both changes with the same commit? In that case, will these changes work?

class Function:
    ...
    def __init__(
        self,
        vm: "VM",
        input_storage: list[Container],
        output_storage: list[Container],
        indices,
        outputs,
        defaults,
        unpack_single: bool,
        return_none: bool,
        output_keys,
        maker: "FunctionMaker",
+       trust_input: bool = False,
        name: str | None = None,
    ):
        """
        Parameters
        ----------
        ...
        maker
            The `FunctionMaker` that created this instance.
+       trust_input
+           A boolean variable which indicates whether or not to perform checks on the input. If true, we do not check the input parameter
        name
            A string name.
        """
        self.vm = vm
        self.input_storage = input_storage
        self.output_storage = output_storage
        self.indices = indices
        self.outputs = outputs
        self.defaults = defaults
        self.unpack_single = unpack_single
        self.return_none = return_none
        self.maker = maker
        self.profile = None  # reassigned in FunctionMaker.create
-       self.trust_input = False  # If True, we don't check the input parameter
+      self.trust_input = trust_input # If True, we don't check the input parameter
        self.name = name
        self.nodes_with_inner_function = []
        self.output_keys = output_keys

        if self.output_keys is not None:
            warnings.warn("output_keys is deprecated.", FutureWarning)

        assert len(self.input_storage) == len(self.maker.fgraph.inputs)
        assert len(self.output_storage) == len(self.maker.fgraph.outputs)
+      if len(self.input_storage) == 0:
+            self.trust_input = True # trust the input in case the input is empty for speed.
        self.has_defaults = any(refeed for _, refeed, _ in self.defaults)

I will remove the following block from the __call__ function (from my previous commit).

if len(args) == 0:  # for speed we trust the input for empty args
        trust_input = True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to check if share variables don't count as inputs here since they don't matter for our consideration of trust input. Check the docs for shared variables.

Also we would want a test that it is indeed set to true when we want and not otherwise.

Yes it can be a single commit since changes are related, but can also be two if you prefer. Should also test the passing trust input directly

@ricardoV94 ricardoV94 added the enhancement New feature or request label Feb 14, 2025
@ricardoV94
Copy link
Member

While you're at it do you want to tackle #1116 as well?

@Aarsh-Wankar
Copy link
Contributor Author

@ricardoV94 I added the extra argument trust_input in pytensor.function and the intermediate functions too, all the way to the arguments of Function class. I check if the length of input (the one given as an input by the user) is zero in pytensor.function, and set trust_input to True in that case. Shared variables are added to the input list during the execution of pfunc, so there are no shared variables in the initial input list passed to pytensor.function.

Test code

import pytensor
import pytensor.tensor as pt
import timeit
import numpy as np
fn = pytensor.function([], pt.zeros((5)), trust_input=False)
# trust_input is still set to True, as input list is empty.
ar1 = []
ar2 = []
for i in range(10):
    ar1.append(timeit.timeit(fn, number=100000))
print(np.array(ar1).mean(), np.array(ar1).std())
# Output: 0.18654848000151106 0.0037660645585986434

fn.trust_input=False

for i in range(10):
    ar2.append(timeit.timeit(fn, number=100000))
print(np.array(ar2).mean(), np.array(ar2).std()) 
# Output: 0.22520774999284185 0.011584992775607648

This also closes #1116.

Are any more changes required?

@ricardoV94
Copy link
Member

@Aarsh-Wankar sounds like you pulled the main branch with merge (instead of rebase or --ff) which makes it hard to see only the differences introduced by this PR: https://github.com/pymc-devs/pytensor/pull/1206/files

Con you rebase from main instead (and then force push here). If you are unsure, do checkout your branch into a backup branch so you can easily restore it if things go south.

@Aarsh-Wankar
Copy link
Contributor Author

@ricardoV94 Sorry for the mistake; this is the first time I am contributing to an open-source repository 😅. Also, thanks for your suggestion, I rebased it to main and force pushed the changes. There are only four commits now. Please let me know if anything needs to be changed.

Comment on lines 169 to 171
If True, the inputs are trusted to be correct. This is used to avoid
the overhead of checking the inputs for correctness. This should only
be used if the inputs are guaranteed to be correct.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If True, the inputs are trusted to be correct. This is used to avoid
the overhead of checking the inputs for correctness. This should only
be used if the inputs are guaranteed to be correct.
If True, no input validation checks are performed when the function is called. This includes checking the number of inputs, their types and that multiple inputs are not aliased to each other. Failure to meet any of these conditions can lead to computational errors or to the interpreter crashing.

on_unused_input
What to do if a variable in the 'inputs' list is not used in the graph.
Possible values are 'raise', 'warn', 'ignore' and None.
trust_input: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
trust_input: bool
trust_input: bool, default False

Comment on lines 1312 to 1314
If True, the inputs are trusted to be correct. This is used to avoid
the overhead of checking the inputs for correctness. This should only
be used if the inputs are guaranteed to be correct.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy the final docstring after adjusting to my comment above.

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a test for the no inputs case and the allowing passing the keyword argument. The easiest way to check the second case is to repurpose one of the existing tests.

@Aarsh-Wankar
Copy link
Contributor Author

@ricardoV94 I added the tests, but some checks seem to be failing. Could you please take a look?

@ricardoV94
Copy link
Member

Pre-commit is failing you should be able to reproduce locally if you install it

@Aarsh-Wankar
Copy link
Contributor Author

Yes, I fixed that; apart from that, a few other tests failed because I didn't add the trust_input parameter to the _Maker class in the DebugMode. I fixed that, and now only three tests fail. All of them seem to be segmentation faults in tests/tensor. I can't make out how my changes affect these files.

@ricardoV94
Copy link
Member

All of them seem to be segmentation faults in tests/tensor. I can't make out how my changes affect these files.

That's the sort of stuff you seen when you trust input but shouldn't

@ricardoV94
Copy link
Member

I made a mistake. trust_input has an effect when there are shared inputs, which are implicit inputs. So our logic of checking if len(inputs)==0 is wrong.

More importantly, it means the original issue didn't make sense. A function without explicit and implicit inputs is not something we actually care about. That would be a constant function. No need to optimize for that case.

Here is an example:

import pytensor

x = pytensor.shared(5.0)
out = x + 1
fn = pytensor.function([], out)
fn.trust_input  # True (after this PR)

However the shared inputs live in the input_storage which is checked when trust_input=False

fn.input_storage  # [<array(5.)>]

So for this PR let's just keep the option to set trust_input explicitly, but remove the default of True when len(inputs)==0.

Sorry for the trouble!

@Aarsh-Wankar
Copy link
Contributor Author

Oh, not at all. It was fun, and I learned a lot! 😀 Sure, I'll remove the default trust_input optimization.

Copy link

codecov bot commented Feb 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.99%. Comparing base (69efc68) to head (d13584a).
Report is 142 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1206   +/-   ##
=======================================
  Coverage   81.99%   81.99%           
=======================================
  Files         188      188           
  Lines       48551    48553    +2     
  Branches     8673     8673           
=======================================
+ Hits        39810    39812    +2     
  Misses       6579     6579           
  Partials     2162     2162           
Files with missing lines Coverage Δ
pytensor/compile/debugmode.py 61.55% <100.00%> (+0.03%) ⬆️
pytensor/compile/function/pfunc.py 82.92% <ø> (ø)
pytensor/compile/function/types.py 80.71% <100.00%> (+0.02%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ricardoV94 ricardoV94 changed the title Automatically set trust_input=True when function has no inputs Allow passing trust_input to function Feb 28, 2025
@ricardoV94 ricardoV94 merged commit 5b82a40 into pymc-devs:main Feb 28, 2025
73 checks passed
@ricardoV94
Copy link
Member

Thanks @Aarsh-Wankar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automatically set trust_input=True when function has no inputs Allow passing trust_input to pytensor.function

2 participants