Skip to content

[ENH] Feature constructor optimization#5975

Merged
VesnaT merged 3 commits intobiolab:masterfrom
ales-erjavec:feature-constructor-opt
May 23, 2022
Merged

[ENH] Feature constructor optimization#5975
VesnaT merged 3 commits intobiolab:masterfrom
ales-erjavec:feature-constructor-opt

Conversation

@ales-erjavec
Copy link
Contributor

Issue

Ref: #5911

Parallel (or alternative) to #5949

Description of changes

Avoid iteration over table rows, checks in inner loops, construction of Value instances.

Includes
  • Code changes
  • Tests
  • Documentation

@codecov
Copy link

codecov bot commented May 17, 2022

Codecov Report

Merging #5975 (92b2864) into master (369f2c3) will increase coverage by 0.02%.
The diff coverage is 98.90%.

@@            Coverage Diff             @@
##           master    #5975      +/-   ##
==========================================
+ Coverage   86.40%   86.43%   +0.02%     
==========================================
  Files         315      315              
  Lines       67155    67218      +63     
==========================================
+ Hits        58025    58097      +72     
+ Misses       9130     9121       -9     

@ales-erjavec
Copy link
Contributor Author

Benchmarked using

import statistics

import timeit
import numpy as np

from Orange.data import ContinuousVariable, Domain, Table
from Orange.widgets.data.owfeatureconstructor import ContinuousDescriptor, \
    construct_variables

N = 100_000
table = new_domain = None


def setup():
    global table, new_domain
    var_names = ['Var' + str(i) for i in range(100)]
    variables = [ContinuousVariable(name=var_name) for var_name in var_names]
    domain = Domain(variables)
    test_data = np.random.rand(N, len(variables))
    table = Table.from_numpy(domain, test_data)

    expr = 'Var0 + Var1'
    desc = ContinuousDescriptor(expr, expr, number_of_decimals=None)
    constr_vars = construct_variables([desc], table)
    new_domain = Domain(table.domain.attributes + tuple(constr_vars))


def run():
    table.transform(new_domain)


res = timeit.repeat(
    "run()", setup="setup()", globals=globals(), repeat=10, number=1
)
print(f"Avg: {statistics.mean(res):.4f}, Min: {min(res):.4f}, Std: {statistics.stdev(res):.4f}")

Before: Avg: 0.9430, Min: 0.9309, Std: 0.0086
After: Avg: 0.0451, Min: 0.0447, Std: 0.0002

@markotoplak
Copy link
Member

Wow, 20x speedup!

@ales-erjavec ales-erjavec force-pushed the feature-constructor-opt branch 2 times, most recently from 9424cfe to d133f43 Compare May 18, 2022 14:14
@ales-erjavec ales-erjavec force-pushed the feature-constructor-opt branch from d133f43 to ad56f9f Compare May 18, 2022 14:21
@ales-erjavec ales-erjavec force-pushed the feature-constructor-opt branch from df02626 to 92b2864 Compare May 20, 2022 12:21
@VesnaT VesnaT merged commit 67e9629 into biolab:master May 23, 2022
@ales-erjavec ales-erjavec deleted the feature-constructor-opt branch April 26, 2024 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants