Skip to content

Commit 90c156e

Browse files
authored
Improve document for IDE support and simplify doc build (#654)
* Update developer-tools.md * Update README.md * Create run-in-container.sh * Update developer-tools.html * fix * better * Update README.md * improve * Apply suggestions from code review * final * address comments
1 parent 667ac6a commit 90c156e

File tree

6 files changed

+121
-38
lines changed

6 files changed

+121
-38
lines changed

.dev/build-docs.sh

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
docker run \
19+
-e HOST_UID=$(id -u) \
20+
-e HOST_GID=$(id -g) \
21+
--mount type=bind,source="$PWD",target="/spark-website" \
22+
-w /spark-website \
23+
docs-builder:latest \
24+
/bin/bash -c "sh .dev/run-in-container.sh"

.dev/run-in-container.sh

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
# 1.Set env variable.
19+
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-arm64
20+
export PATH=$JAVA_HOME/bin:$PATH
21+
22+
# 2.Install bundler.
23+
gem install bundler -v 2.4.22
24+
bundle install
25+
26+
# 3. Create a user matching the host UID/GID
27+
groupadd -g $HOST_GID docuser
28+
useradd -u $HOST_UID -g $HOST_GID -m docuser
29+
30+
# We need this link to make sure `python3` points to `python3.11` which contains the prerequisite packages.
31+
ln -s "$(which python3.11)" "/usr/local/bin/python3"
32+
33+
# Build docs
34+
rm -rf .jekyll-cache
35+
su docuser -c "bundle exec jekyll build"

README.md

Lines changed: 17 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -3,31 +3,23 @@
33
In this directory you will find text files formatted using Markdown, with an `.md` suffix.
44

55
Building the site requires [Ruby 3](https://www.ruby-lang.org), [Jekyll](http://jekyllrb.com/docs), and
6-
[Rouge](https://github.com/rouge-ruby/rouge).
7-
The easiest way to install the right version of these tools is using
8-
[Bundler](https://bundler.io/) and running `bundle install` in this directory.
9-
10-
See also [https://github.com/apache/spark/blob/master/docs/README.md](https://github.com/apache/spark/blob/master/docs/README.md)
11-
12-
A site build will update the directories and files in the `site` directory with the generated files.
13-
Using Jekyll via `bundle exec jekyll` locks it to the right version.
14-
So after this you can generate the html website by running `bundle exec jekyll build` in this
15-
directory. Use the `--watch` flag to have jekyll recompile your files as you save changes.
16-
17-
In addition to generating the site as HTML from the Markdown files, jekyll can serve the site via
18-
a web server. To build the site and run a web server use the command `bundle exec jekyll serve` which runs
19-
the web server on port 4000, then visit the site at http://localhost:4000.
20-
21-
Please make sure you always run `bundle exec jekyll build` after testing your changes with
22-
`bundle exec jekyll serve`, otherwise you end up with broken links in a few places.
23-
24-
## Updating Jekyll version
25-
26-
To update `Jekyll` or any other gem please follow these steps:
27-
28-
1. Update the version in the `Gemfile`
29-
1. Run `bundle update` which updates the `Gemfile.lock`
30-
1. Commit both files
6+
[Rouge](https://github.com/rouge-ruby/rouge). The most reliable way to ensure a compatible environment
7+
is to use the official Docker build image from the Apache Spark repository.
8+
9+
If you haven't already, clone the [Apache Spark](https://github.com/apache/spark) repository. Navigate to
10+
the Spark root directory and run the following command to create the builder image:
11+
```
12+
docker build \
13+
--tag docs-builder:latest \
14+
--file dev/spark-test-image/docs/Dockerfile \
15+
dev/spark-test-image-util/docs/
16+
```
17+
18+
Once the image is built, navigate to the `spark-website` root directory, run the script which processes
19+
the Markdown files in the Docker container.
20+
```
21+
SPARK_WEBSITE_PATH="/path/to/spark-website" sh .dev/build-docs.sh
22+
```
3123

3224
## Docs sub-dir
3325

developer-tools.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -352,17 +352,32 @@ By default, this script will format files that differ from git master. For more
352352

353353
<h3>IDE setup</h3>
354354

355+
Make sure you have a clean start before setting up the IDE: A clean git clone of the Spark repo, install the latest
356+
version of the IDE.
357+
358+
If something goes wrong, clear the build outputs by `./build/sbt clean` and `./build/mvn clean`, clear the m2
359+
cache by `rm -rf ~/.m2/repository/*`, re-import the project into the IDE cleanly and try again.
360+
355361
<h4>IntelliJ</h4>
356362

357363
While many of the Spark developers use SBT or Maven on the command line, the most common IDE we
358-
use is IntelliJ IDEA. You can get the community edition for free (Apache committers can get
359-
free IntelliJ Ultimate Edition licenses) and install the JetBrains Scala plugin from `Preferences > Plugins`.
364+
use is IntelliJ IDEA. You need to install the JetBrains Scala plugin from `Preferences > Plugins`.
365+
366+
Due to the complexity of Spark build, please modify the following global settings of IntelliJ IDEA:
367+
368+
- Go to `Settings -> Build, Execution, Deployment -> Build Tools -> Maven -> Importing`, make sure you
369+
choose "Detect automatically" for `Generated source folders`, and choose "generate sources" for
370+
`Phase to be used for folders update`.
371+
- Go to `Settings -> Build, Execution, Deployment -> Compiler -> Scala Compiler -> Scala Compiler Server`,
372+
pick a large enough number for `Maximum heap size, MB`, such as "4000".
360373

361374
To create a Spark project for IntelliJ:
362375

363376
- Download IntelliJ and install the
364377
<a href="https://confluence.jetbrains.com/display/SCA/Scala+Plugin+for+IntelliJ+IDEA">Scala plug-in for IntelliJ</a>.
365-
- Go to `File -> Import Project`, locate the spark source directory, and select "Maven Project".
378+
- Go to `File -> Import Project`, locate the spark source directory, and select "Maven Project". It's important to
379+
pick Maven instead of sbt here, as Spark has complicated building logic that is implemented for sbt using Scala code
380+
in `SparkBuilder.scala`, and IntelliJ IDEA cannot understant it well.
366381
- In the Import wizard, it's fine to leave settings at their default. However it is usually useful
367382
to enable "Import Maven projects automatically", since changes to the project structure will
368383
automatically update the IntelliJ project.

site/developer-tools.html

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -481,18 +481,35 @@ <h3>Formatting code</h3>
481481

482482
<h3>IDE setup</h3>
483483

484+
<p>Make sure you have a clean start before setting up the IDE: A clean git clone of the Spark repo, install the latest
485+
version of the IDE.</p>
486+
487+
<p>If something goes wrong, clear the build outputs by <code class="language-plaintext highlighter-rouge">./build/sbt clean</code> and <code class="language-plaintext highlighter-rouge">./build/mvn clean</code>, clear the m2
488+
cache by <code class="language-plaintext highlighter-rouge">rm -rf ~/.m2/repository/*</code>, re-import the project into the IDE cleanly and try again.</p>
489+
484490
<h4>IntelliJ</h4>
485491

486492
<p>While many of the Spark developers use SBT or Maven on the command line, the most common IDE we
487-
use is IntelliJ IDEA. You can get the community edition for free (Apache committers can get
488-
free IntelliJ Ultimate Edition licenses) and install the JetBrains Scala plugin from <code class="language-plaintext highlighter-rouge">Preferences &gt; Plugins</code>.</p>
493+
use is IntelliJ IDEA. You need to install the JetBrains Scala plugin from <code class="language-plaintext highlighter-rouge">Preferences &gt; Plugins</code>.</p>
494+
495+
<p>Due to the complexity of Spark build, please modify the following global settings of IntelliJ IDEA:</p>
496+
497+
<ul>
498+
<li>Go to <code class="language-plaintext highlighter-rouge">Settings -&gt; Build, Execution, Deployment -&gt; Build Tools -&gt; Maven -&gt; Importing</code>, make sure you
499+
choose &#8220;Detect automatically&#8221; for <code class="language-plaintext highlighter-rouge">Generated source folders</code>, and choose &#8220;generate sources&#8221; for
500+
<code class="language-plaintext highlighter-rouge">Phase to be used for folders update</code>.</li>
501+
<li>Go to <code class="language-plaintext highlighter-rouge">Settings -&gt; Build, Execution, Deployment -&gt; Compiler -&gt; Scala Compiler -&gt; Scala Compiler Server</code>,
502+
pick a large enough number for <code class="language-plaintext highlighter-rouge">Maximum heap size, MB</code>, such as &#8220;4000&#8221;.</li>
503+
</ul>
489504

490505
<p>To create a Spark project for IntelliJ:</p>
491506

492507
<ul>
493508
<li>Download IntelliJ and install the
494509
<a href="https://confluence.jetbrains.com/display/SCA/Scala+Plugin+for+IntelliJ+IDEA">Scala plug-in for IntelliJ</a>.</li>
495-
<li>Go to <code class="language-plaintext highlighter-rouge">File -&gt; Import Project</code>, locate the spark source directory, and select &#8220;Maven Project&#8221;.</li>
510+
<li>Go to <code class="language-plaintext highlighter-rouge">File -&gt; Import Project</code>, locate the spark source directory, and select &#8220;Maven Project&#8221;. It&#8217;s important to
511+
pick Maven instead of sbt here, as Spark has complicated building logic that is implemented for sbt using Scala code
512+
in <code class="language-plaintext highlighter-rouge">SparkBuilder.scala</code>, and IntelliJ IDEA cannot understant it well.</li>
496513
<li>In the Import wizard, it&#8217;s fine to leave settings at their default. However it is usually useful
497514
to enable &#8220;Import Maven projects automatically&#8221;, since changes to the project structure will
498515
automatically update the IntelliJ project.</li>

site/sitemap.xml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1153,35 +1153,35 @@
11531153
<changefreq>weekly</changefreq>
11541154
</url>
11551155
<url>
1156-
<loc>https://spark.apache.org/streaming/</loc>
1156+
<loc>https://spark.apache.org/spark-connect/</loc>
11571157
<changefreq>weekly</changefreq>
11581158
</url>
11591159
<url>
1160-
<loc>https://spark.apache.org/sql/</loc>
1160+
<loc>https://spark.apache.org/pandas-on-spark/</loc>
11611161
<changefreq>weekly</changefreq>
11621162
</url>
11631163
<url>
1164-
<loc>https://spark.apache.org/mllib/</loc>
1164+
<loc>https://spark.apache.org/graphx/</loc>
11651165
<changefreq>weekly</changefreq>
11661166
</url>
11671167
<url>
1168-
<loc>https://spark.apache.org/graphx/</loc>
1168+
<loc>https://spark.apache.org/mllib/</loc>
11691169
<changefreq>weekly</changefreq>
11701170
</url>
11711171
<url>
1172-
<loc>https://spark.apache.org/screencasts/</loc>
1172+
<loc>https://spark.apache.org/streaming/</loc>
11731173
<changefreq>weekly</changefreq>
11741174
</url>
11751175
<url>
11761176
<loc>https://spark.apache.org/news/</loc>
11771177
<changefreq>weekly</changefreq>
11781178
</url>
11791179
<url>
1180-
<loc>https://spark.apache.org/pandas-on-spark/</loc>
1180+
<loc>https://spark.apache.org/screencasts/</loc>
11811181
<changefreq>weekly</changefreq>
11821182
</url>
11831183
<url>
1184-
<loc>https://spark.apache.org/spark-connect/</loc>
1184+
<loc>https://spark.apache.org/sql/</loc>
11851185
<changefreq>weekly</changefreq>
11861186
</url>
11871187
<url>

0 commit comments

Comments
 (0)