[C++] `arrow::schema` construction performance degrades to O(n^2) on libc++ for duplicate/unnamed fields

### Describe the bug, including details regarding any error messages, version, and platform.

`arrow::schema` construction performance degrades to quadratic time on libc++ when there's a lot of fields with duplicate or no names. This is most critical when creating wide schemas programmatically.

The performance degradation is caused by the function [CreateNameToIndexMap](https://github.com/apache/arrow/blob/75ef03165dc7e54aca0f842f54598833ddbd0a01/cpp/src/arrow/type.cc#L1332-L1339). The function calls `std::unordered_multimap::emplace` once per field. On libc++, this function call has linear runtime (amortized) for duplicate keys because it iterates past every existing entry to emplace the new element at the end of the range (see LLVM bugzilla [16747](https://bugs.llvm.org/show_bug.cgi?id=16747) and [21275](https://bugs.llvm.org/show_bug.cgi?id=21275)).

Some performance benchmarks creating a schema with N int32 fields:
| N | Time (s) |Growth factor (vs previous input) |
| - | - | - |
| 500    | 0.000464958s | NA |
| 5000   | 0.041429s | 89.1x |
| 50000  | 4.12813s  | 99.6x |
| 500000 | 416.985s  | 101.0x |

One fix is to use `map.emplace_hint(map.find(key), ...)` as recommended by libc++ maintainers. This avoids the linear scan but may reverse order for duplicates. Alternatively, using `std::unordered_map<std::string_view, std::vector<int>>` keeps the order but is a slightly larger refactor.

The same benchmark fixing the issue with the latter approach (unordered_map + vector):
| N | Time (s) | Growth factor (vs previous input) |
| - | - | - |
| 500    | 0.000040s | NA |
| 5000   | 0.000066s | 1.65x |
| 50000  | 0.000681s | 10.3x |
| 500000 | 0.007084s | 10.4x |

I can submit a PR if a fix is worth pursuing.

### Component(s)

C++

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] `arrow::schema` construction performance degrades to O(n^2) on libc++ for duplicate/unnamed fields #48977

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

N	Time (s)	Growth factor (vs previous input)
500	0.000464958s	NA
5000	0.041429s	89.1x
50000	4.12813s	99.6x
500000	416.985s	101.0x

N	Time (s)	Growth factor (vs previous input)
500	0.000040s	NA
5000	0.000066s	1.65x
50000	0.000681s	10.3x
500000	0.007084s	10.4x

[C++] arrow::schema construction performance degrades to O(n^2) on libc++ for duplicate/unnamed fields #48977

Description

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[C++] `arrow::schema` construction performance degrades to O(n^2) on libc++ for duplicate/unnamed fields #48977