Skip to content

Middle products for arb_poly, acb_poly, nmod_poly, gr_poly#2613

Merged
fredrik-johansson merged 8 commits intoflintlib:mainfrom
fredrik-johansson:mulmid2
Mar 18, 2026
Merged

Middle products for arb_poly, acb_poly, nmod_poly, gr_poly#2613
fredrik-johansson merged 8 commits intoflintlib:mainfrom
fredrik-johansson:mulmid2

Conversation

@fredrik-johansson
Copy link
Copy Markdown
Collaborator

Adapt the existing mullow algorithms to compute middle products, and use this in Newton iterations.

Timings for _arb_poly_inv_series inverting $\sum x^k / (1 + k^2)$ to length $N$:

prec   N   old        new          speedup
128   4    1.37e-07   1.27e-07     1.079
128   8    3.98e-07   4.03e-07     0.988
128   16    2.4e-06   1.85e-06     1.297
128   32    8.65e-06   6.45e-06     1.341
128   64    3.21e-05   2.39e-05     1.343
128   128    0.000121   0.000103     1.175
128   256    0.000305   0.000271     1.125
128   512    0.000782   0.000694     1.127
128   1024    0.00211   0.0018     1.172
128   2048    0.00365   0.00323     1.130
128   4096    0.00691   0.00629     1.099
128   8192    0.0136   0.0126     1.079
1024   4    2.35e-07   2.38e-07     0.987
1024   8    1.91e-06   1.89e-06     1.011
1024   16    1.46e-05   1.19e-05     1.227
1024   32    7.64e-05   7.17e-05     1.066
1024   64    0.000246   0.00023     1.070
1024   128    0.00061   0.000586     1.041
1024   256    0.00137   0.00132     1.038
1024   512    0.00303   0.0029     1.045
1024   1024    0.00675   0.00646     1.045
1024   2048    0.0135   0.0128     1.055
1024   4096    0.0282   0.0272     1.037
1024   8192    0.0607   0.0589     1.031
8192   4    6.35e-07   6.32e-07     1.005
8192   8    3.33e-05   3.32e-05     1.003
8192   16    0.000275   0.000274     1.004
8192   32    0.000897   0.00088     1.019
8192   64    0.0022   0.00216     1.019
8192   128    0.00486   0.00478     1.017
8192   256    0.0106   0.0103     1.029
8192   512    0.0232   0.0224     1.036
8192   1024    0.0508   0.0498     1.020
8192   2048    0.111   0.106     1.047
8192   4096    0.235   0.225     1.044
8192   8192    0.517   0.505     1.024
65536   4    4.11e-06   4.1e-06     1.002
65536   8    0.000529   0.00053     0.998
65536   16    0.00331   0.00326     1.015
65536   32    0.0087   0.00851     1.022
65536   64    0.0204   0.0201     1.015
65536   128    0.0459   0.0455     1.009
65536   256    0.0996   0.0993     1.003
65536   512    0.214   0.21     1.019
65536   1024    0.472   0.455     1.037
65536   2048    1.061   1.028     1.032
65536   4096    2.311   2.273     1.017
65536   8192    5.129   5.02     1.022

Timings for _acb_poly_inv_series inverting $\sum x^k [1/ (1 + k^2) + i / (2 + k^2)]$ to length $N$:

prec   N   old        new          speedup
128   4    1.08e-06   9.39e-07     1.150
128   8    2.58e-06   2.6e-06     0.992
128   16    1.05e-05   7.93e-06     1.324
128   32    3.64e-05   2.73e-05     1.333
128   64    0.000135   0.0001     1.350
128   128    0.000506   0.000439     1.153
128   256    0.00135   0.0012     1.125
128   512    0.00339   0.00298     1.138
128   1024    0.00895   0.00769     1.164
128   2048    0.0158   0.014     1.129
128   4096    0.0296   0.0267     1.109
128   8192    0.058   0.053     1.094
1024   4    4.6e-06   4.59e-06     1.002
1024   8    1.79e-05   1.78e-05     1.006
1024   16    9.39e-05   7.45e-05     1.260
1024   32    0.00038   0.000344     1.105
1024   64    0.00111   0.00104     1.067
1024   128    0.00268   0.00261     1.027
1024   256    0.00607   0.00583     1.041
1024   512    0.0132   0.0127     1.039
1024   1024    0.0297   0.028     1.061
1024   2048    0.0586   0.0562     1.043
1024   4096    0.121   0.118     1.025
1024   8192    0.257   0.249     1.032
8192   4    0.000103   0.000102     1.010
8192   8    0.000415   0.000415     1.000
8192   16    0.00177   0.00172     1.029
8192   32    0.0045   0.00439     1.025
8192   64    0.0101   0.01     1.010
8192   128    0.0221   0.0215     1.028
8192   256    0.0468   0.0462     1.013
8192   512    0.104   0.099     1.051
8192   1024    0.221   0.217     1.018
8192   2048    0.464   0.456     1.018
8192   4096    0.981   0.966     1.016
8192   8192    2.178   2.162     1.007
65536   4    0.0015   0.00149     1.007
65536   8    0.00636   0.00634     1.003
65536   16    0.0214   0.0202     1.059
65536   32    0.0455   0.0435     1.046
65536   64    0.096   0.0934     1.028
65536   128    0.204   0.201     1.015
65536   256    0.438   0.431     1.016
65536   512    0.929   0.918     1.012
65536   1024    2.018   1.982     1.018
65536   2048    4.517   4.42     1.022
65536   4096    9.787   9.695     1.009
65536   8192    21.606   21.451     1.007

The timings should improve a bit more with better tuning (I'm conservatively reusing the mullow parameters for now) and with better middle product code in _fmpz_poly_mulmid.

@fredrik-johansson fredrik-johansson changed the title Middle products for arb_poly, acb_poly Middle products for arb_poly, acb_poly, nmod_poly, gr_poly Mar 18, 2026
@fredrik-johansson
Copy link
Copy Markdown
Collaborator Author

I've updated the PR with middle product code for nmod_poly and gr_poly.

Generic Newton-based algorithms (_gr_poly_inv_series, _gr_poly_div_series, _gr_poly_rsqrt_series, _gr_poly_sqrt_series, _gr_poly_exp_series, etc) now use this when applicable.

Various tuning parameters for nmod will want updating (cutoffs for Newton iteration should decrease slightly), but I won't do that here.

Currently gr_poly_mulmid just uses a placeholder implementation unless overridden, which is done for nmod, fmpz, fmpq, arb, acb.

Timings for _nmod_poly_inv_series with a 64-bit modulus:

         n         old         new  speedup
       109    4.23e-06    4.22e-06  1.002x
       163    8.45e-06    8.49e-06  0.995x
       244    1.75e-05    1.75e-05  1.000x
       366    3.41e-05    3.37e-05  1.012x
       549    5.29e-05    5.23e-05  1.011x
       823    8.26e-05    8.08e-05  1.022x
      1234    0.000125    0.000122  1.025x
      1851    0.000191    0.000186  1.027x
      2776    0.000314    0.000305  1.030x
      4164    0.000481    0.000466  1.032x
      6246    0.000741    0.000722  1.026x
      9369     0.00112     0.00108  1.037x
     14053     0.00172     0.00168  1.024x
     21079     0.00257     0.00249  1.032x
     31618     0.00394     0.00382  1.031x
     47427     0.00626     0.00612  1.023x
     71140     0.00957     0.00936  1.022x
    106710      0.0148      0.0145  1.021x
    160065      0.0224      0.0219  1.023x
    240097       0.035      0.0344  1.017x
    360145      0.0597      0.0586  1.019x
    540217      0.0931      0.0913  1.020x
    810325       0.149       0.145  1.028x

Timings for _nmod_poly_exp_series with a 64-bit modulus:

         n         old         new  speedup
       109    5.35e-06     5.3e-06  1.009x
       163    1.02e-05       1e-05  1.020x
       244    2.02e-05       2e-05  1.010x
       366    4.03e-05    3.95e-05  1.020x
       549    7.45e-05     7.5e-05  0.993x
       823    0.000144    0.000143  1.007x
      1234    0.000278    0.000277  1.004x
      1851    0.000433    0.000413  1.048x
      2776    0.000646    0.000615  1.050x
      4164     0.00103    0.000986  1.045x
      6246     0.00151     0.00144  1.049x
      9369     0.00234     0.00222  1.054x
     14053     0.00347     0.00329  1.055x
     21079     0.00532     0.00506  1.051x
     31618     0.00793     0.00761  1.042x
     47427      0.0124      0.0121  1.025x
     71140      0.0192      0.0189  1.016x
    106710      0.0291      0.0285  1.021x
    160065      0.0451      0.0438  1.030x
    240097      0.0697      0.0676  1.031x
    360145       0.115       0.112  1.027x
    540217       0.182       0.182  1.000x
    810325       0.285       0.284  1.004x

As part of the nmod_poly implementation, I removed the unused bits parameter from the mul_KS methods.

@fredrik-johansson fredrik-johansson merged commit b04ff1f into flintlib:main Mar 18, 2026
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant