Commit f657566
authored
perf: optimize compute.Take for fewer memory allocations (apache#557)
### Rationale for this change
When writing data to partitioned Iceberg tables using `iceberg-go`'s new
partitioned write capability, I found that Arrow's `compute.Take`
operation was causing the writes to be significantly slower than
unpartitioned writes.
Causes:
- the `VarBinaryImpl` function here in `arrow-go` was pre-allocating the
data buffer with only `meanValueLen` (size for just one string/buffer),
not the total size needed for all strings. For, my use case (writing
100k rows at a time) this was causing 15x buffer reallocations as the
buffer incrementally scales up. For my iceberg table with 20+
string/binary cols of various size this is significant overhead.
- when `VarBinaryImpl` allocates additional space for new items, it
would only add the exact amount needed for the new value. This caused
O(n) reallocations.
### What changes are included in this PR?
1. Pre-allocate upfront with a better estimate of the necessary buffer
size to eliminate repeated reallocations.
I somewhat arbitrarily chose a cap of 16MB as a guess at what would be
effective at reducing the number of allocations but also trying to be
cognizant of the library being used in many bespoke scenarios and not
wanting to make massive memory spikes. For my use case, I never hit this
16MB threshold and it could be smaller.
I am curious for your input on whether there should be a cap at all or
what a reasonable cap would be.
2. Use exponential growth for additional allocations for O(log n) total
reallocations.
### Are these changes tested?
No unit tests.
However, with a dedicated reproducing script that mimics my use case
with different write sizes (on a macbook pro m3 max + 64GB ram) average
of 3 `table.Append()` calls:
| Configuration | 100k rows | 500k rows | 1M rows | 2.5M rows | 10M rows
|
|--------------|-----------|-----------|---------|-----------|-----------|
| **Before** | 4.1s | 53.3s | didn't try | didn't try | didn't try |
| **Change 1** | 336ms | 2.41s | 8.75s | cancelled after 5mins | didn't
try |
| **Change 2** | 227ms | 897ms | 1.72s | 4.10s | 17.1s |
### Are there any user-facing changes?
No; just more performant1 parent a9b0be4 commit f657566
File tree
2 files changed
+237
-2
lines changed- arrow/compute
- internal/kernels
2 files changed
+237
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1486 | 1486 | | |
1487 | 1487 | | |
1488 | 1488 | | |
1489 | | - | |
| 1489 | + | |
| 1490 | + | |
| 1491 | + | |
| 1492 | + | |
| 1493 | + | |
| 1494 | + | |
1490 | 1495 | | |
1491 | 1496 | | |
1492 | 1497 | | |
| |||
1503 | 1508 | | |
1504 | 1509 | | |
1505 | 1510 | | |
1506 | | - | |
| 1511 | + | |
| 1512 | + | |
| 1513 | + | |
| 1514 | + | |
| 1515 | + | |
| 1516 | + | |
| 1517 | + | |
| 1518 | + | |
| 1519 | + | |
| 1520 | + | |
| 1521 | + | |
| 1522 | + | |
| 1523 | + | |
| 1524 | + | |
| 1525 | + | |
| 1526 | + | |
1507 | 1527 | | |
1508 | 1528 | | |
1509 | 1529 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1650 | 1650 | | |
1651 | 1651 | | |
1652 | 1652 | | |
| 1653 | + | |
| 1654 | + | |
| 1655 | + | |
| 1656 | + | |
| 1657 | + | |
| 1658 | + | |
| 1659 | + | |
| 1660 | + | |
| 1661 | + | |
| 1662 | + | |
| 1663 | + | |
| 1664 | + | |
| 1665 | + | |
| 1666 | + | |
| 1667 | + | |
| 1668 | + | |
| 1669 | + | |
| 1670 | + | |
| 1671 | + | |
| 1672 | + | |
| 1673 | + | |
| 1674 | + | |
| 1675 | + | |
| 1676 | + | |
| 1677 | + | |
| 1678 | + | |
| 1679 | + | |
| 1680 | + | |
| 1681 | + | |
| 1682 | + | |
| 1683 | + | |
| 1684 | + | |
| 1685 | + | |
| 1686 | + | |
| 1687 | + | |
| 1688 | + | |
| 1689 | + | |
| 1690 | + | |
| 1691 | + | |
| 1692 | + | |
| 1693 | + | |
| 1694 | + | |
| 1695 | + | |
| 1696 | + | |
| 1697 | + | |
| 1698 | + | |
| 1699 | + | |
| 1700 | + | |
| 1701 | + | |
| 1702 | + | |
| 1703 | + | |
| 1704 | + | |
| 1705 | + | |
| 1706 | + | |
| 1707 | + | |
| 1708 | + | |
| 1709 | + | |
| 1710 | + | |
| 1711 | + | |
| 1712 | + | |
| 1713 | + | |
| 1714 | + | |
| 1715 | + | |
| 1716 | + | |
| 1717 | + | |
| 1718 | + | |
| 1719 | + | |
| 1720 | + | |
| 1721 | + | |
| 1722 | + | |
| 1723 | + | |
| 1724 | + | |
| 1725 | + | |
| 1726 | + | |
| 1727 | + | |
| 1728 | + | |
| 1729 | + | |
| 1730 | + | |
| 1731 | + | |
| 1732 | + | |
| 1733 | + | |
| 1734 | + | |
| 1735 | + | |
| 1736 | + | |
| 1737 | + | |
| 1738 | + | |
| 1739 | + | |
| 1740 | + | |
| 1741 | + | |
| 1742 | + | |
| 1743 | + | |
| 1744 | + | |
| 1745 | + | |
| 1746 | + | |
| 1747 | + | |
| 1748 | + | |
| 1749 | + | |
| 1750 | + | |
| 1751 | + | |
| 1752 | + | |
| 1753 | + | |
| 1754 | + | |
| 1755 | + | |
| 1756 | + | |
| 1757 | + | |
| 1758 | + | |
| 1759 | + | |
| 1760 | + | |
| 1761 | + | |
| 1762 | + | |
| 1763 | + | |
| 1764 | + | |
| 1765 | + | |
| 1766 | + | |
| 1767 | + | |
| 1768 | + | |
| 1769 | + | |
| 1770 | + | |
| 1771 | + | |
| 1772 | + | |
| 1773 | + | |
| 1774 | + | |
| 1775 | + | |
| 1776 | + | |
| 1777 | + | |
| 1778 | + | |
| 1779 | + | |
| 1780 | + | |
| 1781 | + | |
| 1782 | + | |
| 1783 | + | |
| 1784 | + | |
| 1785 | + | |
| 1786 | + | |
| 1787 | + | |
| 1788 | + | |
| 1789 | + | |
| 1790 | + | |
| 1791 | + | |
| 1792 | + | |
| 1793 | + | |
| 1794 | + | |
| 1795 | + | |
| 1796 | + | |
| 1797 | + | |
| 1798 | + | |
| 1799 | + | |
| 1800 | + | |
| 1801 | + | |
| 1802 | + | |
| 1803 | + | |
| 1804 | + | |
| 1805 | + | |
| 1806 | + | |
| 1807 | + | |
| 1808 | + | |
| 1809 | + | |
| 1810 | + | |
| 1811 | + | |
| 1812 | + | |
| 1813 | + | |
| 1814 | + | |
| 1815 | + | |
| 1816 | + | |
| 1817 | + | |
| 1818 | + | |
| 1819 | + | |
| 1820 | + | |
| 1821 | + | |
| 1822 | + | |
| 1823 | + | |
| 1824 | + | |
| 1825 | + | |
| 1826 | + | |
| 1827 | + | |
| 1828 | + | |
| 1829 | + | |
| 1830 | + | |
| 1831 | + | |
| 1832 | + | |
| 1833 | + | |
| 1834 | + | |
| 1835 | + | |
| 1836 | + | |
| 1837 | + | |
| 1838 | + | |
| 1839 | + | |
| 1840 | + | |
| 1841 | + | |
| 1842 | + | |
| 1843 | + | |
| 1844 | + | |
| 1845 | + | |
| 1846 | + | |
| 1847 | + | |
| 1848 | + | |
| 1849 | + | |
| 1850 | + | |
| 1851 | + | |
| 1852 | + | |
| 1853 | + | |
| 1854 | + | |
| 1855 | + | |
| 1856 | + | |
| 1857 | + | |
| 1858 | + | |
| 1859 | + | |
| 1860 | + | |
| 1861 | + | |
| 1862 | + | |
| 1863 | + | |
| 1864 | + | |
| 1865 | + | |
| 1866 | + | |
| 1867 | + | |
0 commit comments