Skip to content

Commit 999c1a4

Browse files
authored
docs: reformat markdown (#15650)
* docs: reformat markdown Reformat all the files, remove the exclusion. Also fix a bug in the check, so it will still fail if there are markdown problems that couldn't automatically be fixed. Most problems are autofixed without manual effort. * build: fix rumdl configuration to match editorconfig (indent=4) The default is a 2-space indent, but editorconfig says 4. Make these consistent so there is the least friction for developers.
1 parent 77744c0 commit 999c1a4

File tree

14 files changed

+62
-71
lines changed

14 files changed

+62
-71
lines changed

.pre-commit-config.yml

Lines changed: 1 addition & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -162,28 +162,12 @@ repos:
162162
name: Fix Markdown
163163
language: system
164164
entry: uv
165-
args: [ 'run', 'rumdl', 'fmt' ]
165+
args: [ 'run', 'rumdl', 'check', '--fix' ]
166166
env:
167167
UV_PROJECT: dev-tools
168168
UV_FROZEN: "1"
169169
types: [ 'markdown']
170170
require_serial: true
171-
exclude:
172-
glob:
173-
# TODO: fix formatting of these files separately
174-
- .github/PULL_REQUEST_TEMPLATE.md
175-
- CONTRIBUTING.md
176-
- dev-docs/file-formats.md
177-
- dev-docs/github-issues-howto.md
178-
- dev-tools/aws-jmh/README.md
179-
- dev-tools/scripts/README.md
180-
- lucene/backward-codecs/README.md
181-
- lucene/distribution/src/binary-release/README.md
182-
- lucene/luke/README.md
183-
- lucene/luke/src/distribution/README.md
184-
- lucene/MIGRATE.md
185-
- lucene/SYSTEM_REQUIREMENTS.md
186-
- README.md
187171

188172
- id: ruff-check
189173
name: Fix Python

.rumdl.toml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,11 @@
33
line-length = 0
44
# not really a markdown file, but a template
55
exclude = [ "lucene/documentation/src/markdown/index.template.md" ]
6+
7+
[MD007]
8+
# match indentation set in .editorconfig for least friction
9+
indent = 4
10+
11+
[per-file-ignores]
12+
# doesn't start with level 1 heading on purpose
13+
".github/PULL_REQUEST_TEMPLATE.md" = [ "MD041" ]

CONTRIBUTING.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,9 +59,9 @@ In case your contribution fixes a bug, please create a new test case that fails
5959
### IDE support
6060

6161
- *IntelliJ* - IntelliJ idea can import and build gradle-based projects out of the box. It will default to running tests by calling the gradle wrapper, and while this works, it is can be a bit slow. If instead you configure IntelliJ to use its own built-in test runner by (in 2024 version) navigating to settings for Build Execution & Deployment/Build Tools/Gradle (under File/Settings menu on some platforms) and selecting "Build and Run using: IntelliJ IDEA" and "Run Tests using: IntelliJ IDEA", then some tests will run faster. However some other tests will not run using this configuration.
62-
- *Eclipse* - Basic support ([help/IDEs.txt](https://github.com/apache/lucene/blob/main/help/IDEs.txt#L7)).
63-
- *VSCode* - Basic support ([help/IDEs.txt](https://github.com/apache/lucene/blob/main/help/IDEs.txt#L23)).
64-
- *Neovim* - Basic support ([help/IDEs.txt](https://github.com/apache/lucene/blob/main/help/IDEs.txt#L32)).
62+
- *Eclipse* - Basic support ([help/IDEs.txt](https://github.com/apache/lucene/blob/main/help/IDEs.txt#L7)).
63+
- *VSCode* - Basic support ([help/IDEs.txt](https://github.com/apache/lucene/blob/main/help/IDEs.txt#L23)).
64+
- *Neovim* - Basic support ([help/IDEs.txt](https://github.com/apache/lucene/blob/main/help/IDEs.txt#L32)).
6565
- *Netbeans* - Not tested.
6666

6767
## Benchmarking
@@ -78,7 +78,7 @@ Feel free to share your findings (especially if your implementation performs bet
7878

7979
## Contributing your work
8080

81-
You can open a pull request at https://github.com/apache/lucene.
81+
You can open a pull request at <https://github.com/apache/lucene>.
8282

8383
Please be patient. Committers are busy people too. If no one responds to your patch after a few days, please make friendly reminders. Please incorporate others' suggestions into your patch if you think they're reasonable. Finally, remember that even a patch that is not committed is useful to the community.
8484

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ written in Java.
2727

2828
## Online Documentation
2929

30-
This README file only contains basic setup instructions. For more
30+
This README file only contains basic setup instructions. For more
3131
comprehensive documentation, visit:
3232

3333
- Latest Releases: <https://lucene.apache.org/core/documentation.html>
@@ -38,7 +38,7 @@ comprehensive documentation, visit:
3838

3939
## Building
4040

41-
### Basic steps:
41+
### Basic steps
4242

4343
1. Install JDK 25 using your package manager or download manually from
4444
[OpenJDK](https://jdk.java.net/),
@@ -48,7 +48,7 @@ comprehensive documentation, visit:
4848
2. Clone Lucene's git repository (or download the source distribution).
4949
3. Run gradle launcher script (`gradlew`).
5050

51-
We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README.
51+
We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at <https://jdk.java.net/> and learning more about Java, before returning to this README.
5252

5353
## Contributing
5454

dev-docs/file-formats.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,12 +36,14 @@ on their own.
3636
## How to split the data into files?
3737

3838
Most file formats split the data into 3 files:
39+
3940
- metadata,
4041
- index data,
4142
- raw data.
4243

4344
The metadata file contains all the data that is read once at open time. This
4445
helps on several fronts:
46+
4547
- One can validate the checksums of this data at open time without significant
4648
overhead since all data needs to be read anyway, this helps detect
4749
corruptions early.
@@ -124,4 +126,4 @@ by merges. All default implementations do this.
124126

125127
## How to make backward-compatible changes to file formats?
126128

127-
See [here](../lucene/backward-codecs/README.md).
129+
See [Index Backwards Compatibility](../lucene/backward-codecs/README.md).

dev-docs/github-issues-howto.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ All issues/PRs associated with a milestone must be resolved before the release,
2929

3030
Once the release is done, the Milestone should be closed then a new Milestone for the next release should be created.
3131

32-
You can see the list of current active (opened) Milestones here. https://github.com/apache/lucene/milestones
32+
You can see the list of current active (opened) Milestones here. <https://github.com/apache/lucene/milestones>
3333

3434
See [GitHub documentation](https://docs.github.com/en/issues/using-labels-and-milestones-to-track-work/about-milestones) for more details.
3535

dev-tools/aws-jmh/README.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,23 +15,21 @@
1515
limitations under the License.
1616
-->
1717

18+
# EC2 Microbenchmarks
19+
1820
Runs lucene microbenchmarks across a variety of CPUs in EC2.
1921

2022
Example:
2123

22-
```console
23-
export AWS_ACCESS_KEY_ID=xxxxx
24-
export AWS_SECRET_ACCESS_KEY=yyyy
25-
make PATCH_BRANCH=rmuir:some-speedup
26-
```
24+
export AWS_ACCESS_KEY_ID=xxxxx
25+
export AWS_SECRET_ACCESS_KEY=yyyy
26+
make PATCH_BRANCH=rmuir:some-speedup
2727

2828
Results file will be in build/report.txt
2929

3030
You can also pass additional JMH args if you want:
3131

32-
```console
33-
make PATCH_BRANCH=rmuir:some-speedup JMH_ARGS='float -p size=756'
34-
```
32+
make PATCH_BRANCH=rmuir:some-speedup JMH_ARGS='float -p size=756'
3533

3634
Prerequisites:
3735

dev-tools/scripts/README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,4 +194,3 @@ and prints a regular expression that will match all of them
194194
### gitignore-gen.sh
195195

196196
TBD
197-

lucene/MIGRATE.md

Lines changed: 27 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -36,18 +36,21 @@ Starting with Lucene 11.0.0, the index upgrade policy has been relaxed to allow
3636

3737
#### Upgrade Scenarios
3838

39-
**Scenario 1: No format breaks (wider upgrade span)**
39+
##### Scenario 1: No format breaks (wider upgrade span)
40+
4041
- Index created with Lucene 10.x can be opened directly in Lucene 11.x, 12.x, 13.x, 14.x (as long as MIN_SUPPORTED_MAJOR stays ≤ 10)
4142
- Simply open the index with the new version; segments will be upgraded gradually through normal merging
4243
- Optional: Call `forceMerge()` or use `UpgradeIndexMergePolicy` to upgrade segment formats immediately
4344
- **Important**: You still only get one upgrade per index lifetime. Once MIN_SUPPORTED_MAJOR is bumped above 10, the index becomes unopenable and must be reindexed.
4445

45-
**Scenario 2: Format breaks occur**
46+
##### Scenario 2: Format breaks occur
47+
4648
- If a major version introduces incompatible format changes, `MIN_SUPPORTED_MAJOR` will be bumped
4749
- Indexes created before the new minimum will throw `IndexFormatTooOldException`
4850
- Full reindexing is required for such indexes
4951

50-
**Scenario 3: After using your upgrade**
52+
##### Scenario 3: After using your upgrade
53+
5154
- Index created with Lucene 10.x, successfully opened with Lucene 14.x
5255
- The index's creation version is still 10 (this never changes)
5356
- When Lucene 15+ bumps MIN_SUPPORTED_MAJOR above 10, this index becomes unopenable
@@ -72,6 +75,7 @@ try (Directory dir = FSDirectory.open(indexPath)) {
7275
#### Error Handling
7376

7477
Enhanced error messages will clearly indicate:
78+
7579
- Whether the index creation version is below `MIN_SUPPORTED_MAJOR` (reindex required)
7680
- Whether segments are too old to read directly (sequential upgrade required)
7781

@@ -85,7 +89,7 @@ number of segments that may be merged together.
8589
Query caching is now disabled by default. To enable caching back, do something
8690
like below in a static initialization block:
8791

88-
```
92+
```java
8993
int maxCachedQueries = 1_000;
9094
long maxRamBytesUsed = 50 * 1024 * 1024; // 50MB
9195
IndexSearcher.setDefaultQueryCache(new LRUQueryCache(maxCachedQueries, maxRamBytesUsed));
@@ -124,11 +128,11 @@ DataInput.readGroupVInt method: subclasses should delegate or reimplement it ent
124128

125129
### OpenNLP dependency upgrade
126130

127-
[Apache OpenNLP](https://opennlp.apache.org) 2.x opens the door to accessing various models via the ONNX runtime. To migrate you will need to update any deprecated OpenNLP methods that you may be using.
131+
[Apache OpenNLP](https://opennlp.apache.org) 2.x opens the door to accessing various models via the ONNX runtime. To migrate you will need to update any deprecated OpenNLP methods that you may be using.
128132

129133
### Snowball dependency upgrade
130134

131-
Snowball has folded the "German2" stemmer into their "German" stemmer, so there's no "German2" anymore. For Lucene APIs (TokenFilter, TokenFilterFactory) that accept String, "German2" will be mapped to "German" to avoid breaking users. If you were previously creating German2Stemmer instances, you'll need to change your code to create GermanStemmer instances instead. For more information see https://snowballstem.org/algorithms/german2/stemmer.html
135+
Snowball has folded the "German2" stemmer into their "German" stemmer, so there's no "German2" anymore. For Lucene APIs (TokenFilter, TokenFilterFactory) that accept String, "German2" will be mapped to "German" to avoid breaking users. If you were previously creating German2Stemmer instances, you'll need to change your code to create GermanStemmer instances instead. For more information see <https://snowballstem.org/algorithms/german2/stemmer.html>
132136

133137
### Romanian analysis
134138

@@ -155,6 +159,7 @@ Instead, call storedFields()/termVectors() to return an instance which can fetch
155159
and will be garbage-collected as usual.
156160

157161
For example:
162+
158163
```java
159164
TopDocs hits = searcher.search(query, 10);
160165
StoredFields storedFields = reader.storedFields();
@@ -230,7 +235,6 @@ for the currently-positioned document (doing so will result in undefined behavio
230235
`IOContext.READONCE` for opening internally, as that's the only valid usage pattern for checksum input.
231236
Callers should remove the parameter when calling this method.
232237

233-
234238
### DaciukMihovAutomatonBuilder is renamed to StringsToAutomaton and made package-private
235239

236240
The former `DaciukMihovAutomatonBuilder#build` functionality is exposed through `Automata#makeStringUnion`.
@@ -300,7 +304,7 @@ access the members using method calls instead of field accesses. Affected classe
300304
- `TermAndVector` (GITHUB#13772)
301305
- Many basic Lucene classes, including `CollectionStatistics`, `TermStatistics` and `LeafMetadata` (GITHUB#13328)
302306

303-
### Boolean flags on IOContext replaced with a new ReadAdvice enum.
307+
### Boolean flags on IOContext replaced with a new ReadAdvice enum
304308

305309
The `readOnce`, `load` and `random` flags on `IOContext` have been replaced with a new `ReadAdvice`
306310
enum.
@@ -324,6 +328,7 @@ To migrate, use a provided `CollectorManager` implementation that suits your use
324328
to follow the new API pattern. The straight forward approach would be to instantiate the single-threaded `Collector` in a wrapper `CollectorManager`.
325329

326330
For example
331+
327332
```java
328333
public class CustomCollectorManager implements CollectorManager<CustomCollector, List<Object>> {
329334
@Override
@@ -354,12 +359,12 @@ List<Object> results = searcher.search(query, new CustomCollectorManager());
354359

355360
1. `IntField(String name, int value)`. Use `IntField(String, int, Field.Store)` with `Field.Store#NO` instead.
356361
2. `DoubleField(String name, double value)`. Use `DoubleField(String, double, Field.Store)` with `Field.Store#NO` instead.
357-
2. `FloatField(String name, float value)`. Use `FloatField(String, float, Field.Store)` with `Field.Store#NO` instead.
358-
3. `LongField(String name, long value)`. Use `LongField(String, long, Field.Store)` with `Field.Store#NO` instead.
359-
4. `LongPoint#newDistanceFeatureQuery(String field, float weight, long origin, long pivotDistance)`. Use `LongField#newDistanceFeatureQuery` instead
360-
5. `BooleanQuery#TooManyClauses`, `BooleanQuery#getMaxClauseCount()`, `BooleanQuery#setMaxClauseCount()`. Use `IndexSearcher#TooManyClauses`, `IndexSearcher#getMaxClauseCount()`, `IndexSearcher#setMaxClauseCount()` instead
361-
6. `ByteBuffersDataInput#size()`. Use `ByteBuffersDataInput#length()` instead
362-
7. `SortedSetDocValuesFacetField#label`. `FacetsConfig#pathToString(String[])` can be applied to path as a replacement if string path is desired.
362+
3. `FloatField(String name, float value)`. Use `FloatField(String, float, Field.Store)` with `Field.Store#NO` instead.
363+
4. `LongField(String name, long value)`. Use `LongField(String, long, Field.Store)` with `Field.Store#NO` instead.
364+
5. `LongPoint#newDistanceFeatureQuery(String field, float weight, long origin, long pivotDistance)`. Use `LongField#newDistanceFeatureQuery` instead
365+
6. `BooleanQuery#TooManyClauses`, `BooleanQuery#getMaxClauseCount()`, `BooleanQuery#setMaxClauseCount()`. Use `IndexSearcher#TooManyClauses`, `IndexSearcher#getMaxClauseCount()`, `IndexSearcher#setMaxClauseCount()` instead
366+
7. `ByteBuffersDataInput#size()`. Use `ByteBuffersDataInput#length()` instead
367+
8. `SortedSetDocValuesFacetField#label`. `FacetsConfig#pathToString(String[])` can be applied to path as a replacement if string path is desired.
363368

364369
### Auto I/O throttling disabled by default in ConcurrentMergeScheduler (GITHUB#13293)
365370

@@ -439,7 +444,6 @@ to the new coordinates:
439444
|org.apache.lucene:lucene-analyzers-smartcn |org.apache.lucene:lucene-analysis-smartcn |
440445
|org.apache.lucene:lucene-analyzers-stempel |org.apache.lucene:lucene-analysis-stempel |
441446

442-
443447
### LucenePackage class removed (LUCENE-10260)
444448

445449
`LucenePackage` class has been removed. The implementation string can be
@@ -563,7 +567,7 @@ User dictionary now strictly validates if the (concatenated) segment is the same
563567
unexpected runtime exceptions or behaviours.
564568
For example, these entries are not allowed at all and an exception is thrown when loading the dictionary file.
565569

566-
```
570+
```text
567571
# concatenated "日本経済新聞" does not match the surface form "日経新聞"
568572
日経新聞,日本 経済 新聞,ニホン ケイザイ シンブン,カスタム名詞
569573
@@ -631,7 +635,7 @@ is discouraged in favor of the default `MMapDirectory`.
631635
### Similarity.SimScorer.computeXXXFactor methods removed (LUCENE-8014)
632636

633637
`SpanQuery` and `PhraseQuery` now always calculate their slops as
634-
`(1.0 / (1.0 + distance))`. Payload factor calculation is performed by
638+
`(1.0 / (1.0 + distance))`. Payload factor calculation is performed by
635639
`PayloadDecoder` in the `lucene-queries` module.
636640

637641
### Scorer must produce positive scores (LUCENE-7996)
@@ -645,9 +649,9 @@ As a side-effect of this change, negative boosts are now rejected and
645649

646650
### CustomScoreQuery, BoostedQuery and BoostingQuery removed (LUCENE-8099)
647651

648-
Instead use `FunctionScoreQuery` and a `DoubleValuesSource` implementation. `BoostedQuery`
652+
Instead use `FunctionScoreQuery` and a `DoubleValuesSource` implementation. `BoostedQuery`
649653
and `BoostingQuery` may be replaced by calls to `FunctionScoreQuery.boostByValue()` and
650-
`FunctionScoreQuery.boostByQuery()`. To replace more complex calculations in
654+
`FunctionScoreQuery.boostByQuery()`. To replace more complex calculations in
651655
`CustomScoreQuery`, use the `lucene-expressions` module:
652656

653657
```java
@@ -666,7 +670,6 @@ Changing `IndexOptions` for a field on the fly will now result into an
666670
(`FieldType.indexOptions() != IndexOptions.NONE`) then all documents must have
667671
the same index options for that field.
668672

669-
670673
### IndexSearcher.createNormalizedWeight() removed (LUCENE-8242)
671674

672675
Instead use `IndexSearcher.createWeight()`, rewriting the query first, and using
@@ -744,7 +747,7 @@ Lucene.
744747
### LeafCollector.setScorer() now takes a Scorable rather than a Scorer (LUCENE-6228)
745748

746749
`Scorer` has a number of methods that should never be called from `Collector`s, for example
747-
those that advance the underlying iterators. To hide these, `LeafCollector.setScorer()`
750+
those that advance the underlying iterators. To hide these, `LeafCollector.setScorer()`
748751
now takes a `Scorable`, an abstract class that scorers can extend, with methods
749752
`docId()` and `score()`.
750753

@@ -981,10 +984,10 @@ removed in favour of the newly introduced `search(LeafReaderContextPartition[] p
981984
### Indexing vectors with 8 bit scalar quantization is no longer supported but 7 and 4 bit quantization still work (GITHUB#13519)
982985

983986
8 bit scalar vector quantization is no longer supported: it was buggy
984-
starting in 9.11 (GITHUB#13197). 4 and 7 bit quantization are still
985-
supported. Existing (9.11) Lucene indices that previously used 8 bit
987+
starting in 9.11 (GITHUB#13197). 4 and 7 bit quantization are still
988+
supported. Existing (9.11) Lucene indices that previously used 8 bit
986989
quantization can still be read/searched but the results from
987-
`KNN*VectorQuery` are silently buggy. Further 8 bit quantized vector
990+
`KNN*VectorQuery` are silently buggy. Further 8 bit quantized vector
988991
indexing into such (9.11) indices is not permitted, so your path
989992
forward if you wish to continue using the same 9.11 index is to index
990993
additional vectors into the same field with either 4 or 7 bit

lucene/SYSTEM_REQUIREMENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Apache Lucene runs on Java 25 or greater.
2121

2222
It is also recommended to always use the latest update version of your
2323
Java VM, because bugs may affect Lucene. An overview of known JVM bugs
24-
can be found on https://cwiki.apache.org/confluence/display/LUCENE/JavaBugs
24+
can be found on <https://cwiki.apache.org/confluence/display/LUCENE/JavaBugs>
2525

2626
With all Java versions it is strongly recommended to not use experimental
2727
`-XX` JVM options.

0 commit comments

Comments
 (0)