-
Notifications
You must be signed in to change notification settings - Fork 117
Enable use of remote dependencies through init-container #582
Conversation
…ort. Still look at the old one in case any Spark user is setting it explicitly, though. Author: Marcelo Vanzin <[email protected]> Closes apache#19983 from vanzin/SPARK-22788.
## What changes were proposed in this pull request? Some users depend on source compatibility with the org.apache.spark.sql.execution.streaming.Offset class. Although this is not a stable interface, we can keep it in place for now to simplify upgrades to 2.3. Author: Jose Torres <[email protected]> Closes apache#20012 from joseph-torres/binary-compat.
## What changes were proposed in this pull request? unpersist unused datasets ## How was this patch tested? existing tests and local check in Spark-Shell Author: Zheng RuiFeng <[email protected]> Closes apache#20017 from zhengruifeng/bkm_unpersist.
## What changes were proposed in this pull request? In the previous PR apache#5755 (comment), we dropped `(-[classifier])` from the retrieval pattern. We should add it back; otherwise, > If this pattern for instance doesn't has the [type] or [classifier] token, Ivy will download the source/javadoc artifacts to the same file as the regular jar. ## How was this patch tested? The existing tests Author: gatorsmile <[email protected]> Closes apache#20037 from gatorsmile/addClassifier.
## What changes were proposed in this pull request? * Under Spark Scala Examples: Some of the syntax were written like Java way, It has been re-written as per scala style guide. * Most of all changes are followed to println() statement. ## How was this patch tested? Since, All changes proposed are re-writing println statements in scala way, manual run used to test println. Author: chetkhatri <[email protected]> Closes apache#20016 from chetkhatri/scala-style-spark-examples.
…assigning schedulingPool for stage ## What changes were proposed in this pull request? In AppStatusListener's onStageSubmitted(event: SparkListenerStageSubmitted) method, there are duplicate code: ``` // schedulingPool was assigned twice with the same code stage.schedulingPool = Option(event.properties).flatMap { p => Option(p.getProperty("spark.scheduler.pool")) }.getOrElse(SparkUI.DEFAULT_POOL_NAME) ... ... ... stage.schedulingPool = Option(event.properties).flatMap { p => Option(p.getProperty("spark.scheduler.pool")) }.getOrElse(SparkUI.DEFAULT_POOL_NAME) ``` But, it does not make any sense to do this and there are no comment to explain for this. ## How was this patch tested? N/A Author: wuyi <[email protected]> Closes apache#20033 from Ngone51/dev-spark-22847.
…ay to take time instead of int ## What changes were proposed in this pull request? Fixing configuration that was taking an int which should take time. Discussion in apache#19946 (comment) Made the granularity milliseconds as opposed to seconds since there's a use-case for sub-second reactions to scale-up rapidly especially with dynamic allocation. ## How was this patch tested? TODO: manual run of integration tests against this PR. PTAL cc/ mccheah liyinan926 kimoonkim vanzin mridulm jiangxb1987 ueshin Author: foxish <[email protected]> Closes apache#20032 from foxish/fix-time-conf.
…h huber loss. ## What changes were proposed in this pull request? Expose Python API for _LinearRegression_ with _huber_ loss. ## How was this patch tested? Unit test. Author: Yanbo Liang <[email protected]> Closes apache#19994 from yanboliang/spark-22810.
…e options ## What changes were proposed in this pull request? Introduce a new interface `SessionConfigSupport` for `DataSourceV2`, it can help to propagate session configs with the specified key-prefix to all data source operations in this session. ## How was this patch tested? Add new test suite `DataSourceV2UtilsSuite`. Author: Xingbo Jiang <[email protected]> Closes apache#19861 from jiangxb1987/datasource-configs.
…orized summarizer ## What changes were proposed in this pull request? Make several improvements in dataframe vectorized summarizer. 1. Make the summarizer return `Vector` type for all metrics (except "count"). It will return "WrappedArray" type before which won't be very convenient. 2. Make `MetricsAggregate` inherit `ImplicitCastInputTypes` trait. So it can check and implicitly cast input values. 3. Add "weight" parameter for all single metric method. 4. Update doc and improve the example code in doc. 5. Simplified test cases. ## How was this patch tested? Test added and simplified. Author: WeichenXu <[email protected]> Closes apache#19156 from WeichenXu123/improve_vec_summarizer.
## What changes were proposed in this pull request? This PR eliminates mutable states from the generated code for `Stack`. ## How was this patch tested? Existing test suites Author: Kazuaki Ishizaki <[email protected]> Closes apache#20035 from kiszk/SPARK-22848.
## What changes were proposed in this pull request? Upgrade Spark to Arrow 0.8.0 for Java and Python. Also includes an upgrade of Netty to 4.1.17 to resolve dependency requirements. The highlights that pertain to Spark for the update from Arrow versoin 0.4.1 to 0.8.0 include: * Java refactoring for more simple API * Java reduced heap usage and streamlined hot code paths * Type support for DecimalType, ArrayType * Improved type casting support in Python * Simplified type checking in Python ## How was this patch tested? Existing tests Author: Bryan Cutler <[email protected]> Author: Shixiong Zhu <[email protected]> Closes apache#19884 from BryanCutler/arrow-upgrade-080-SPARK-22324.
## What changes were proposed in this pull request? Moves the -Xlint:unchecked flag in the sbt build configuration from Compile to (Compile, compile) scope, allowing publish and publishLocal commands to work. ## How was this patch tested? Successfully published the spark-launcher subproject from within sbt successfully, where it fails without this patch. Author: Erik LaBianca <[email protected]> Closes apache#20040 from easel/javadoc-xlint.
Prevents Scala 2.12 scaladoc from blowing up attempting to parse java comments. ## What changes were proposed in this pull request? Adds -no-java-comments to docs/scalacOptions under Scala 2.12. Also moves scaladoc configs out of the TestSettings and into the standard sharedSettings section in SparkBuild.scala. ## How was this patch tested? SBT_OPTS=-Dscala-2.12 sbt ++2.12.4 tags/publishLocal Author: Erik LaBianca <[email protected]> Closes apache#20042 from easel/scaladoc-212.
…split by CodegenContext.splitExpressions() ## What changes were proposed in this pull request? Passing global variables to the split method is dangerous, as any mutating to it is ignored and may lead to unexpected behavior. To prevent this, one approach is to make sure no expression would output global variables: Localizing lifetime of mutable states in expressions. Another approach is, when calling `ctx.splitExpression`, make sure we don't use children's output as parameter names. Approach 1 is actually hard to do, as we need to check all expressions and operators that support whole-stage codegen. Approach 2 is easier as the callers of `ctx.splitExpressions` are not too many. Besides, approach 2 is more flexible, as children's output may be other stuff that can't be parameter name: literal, inlined statement(a + 1), etc. close apache#19865 close apache#19938 ## How was this patch tested? existing tests Author: Wenchen Fan <[email protected]> Closes apache#20021 from cloud-fan/codegen.
## What changes were proposed in this pull request? In apache#19681 we introduced a new interface called `AppStatusPlugin`, to register listeners and set up the UI for both live and history UI. However I think it's an overkill for live UI. For example, we should not register `SQLListener` if users are not using SQL functions. Previously we register the `SQLListener` and set up SQL tab when `SparkSession` is firstly created, which indicates users are going to use SQL functions. But in apache#19681 , we register the SQL functions during `SparkContext` creation. The same thing should apply to streaming too. I think we should keep the previous behavior, and only use this new interface for history server. To reflect this change, I also rename the new interface to `SparkHistoryUIPlugin` This PR also refines the tests for sql listener. ## How was this patch tested? existing tests Author: Wenchen Fan <[email protected]> Closes apache#19981 from cloud-fan/listener.
…ecision ## What changes were proposed in this pull request? Test Coverage for `WindowFrameCoercion` and `DecimalPrecision`, this is a Sub-tasks for [SPARK-22722](https://issues.apache.org/jira/browse/SPARK-22722). ## How was this patch tested? N/A Author: Yuming Wang <[email protected]> Closes apache#20008 from wangyum/SPARK-22822.
…ild's partitioning is not decided ## What changes were proposed in this pull request? This is a followup PR of apache#19257 where gatorsmile had left couple comments wrt code style. ## How was this patch tested? Doesn't change any functionality. Will depend on build to see if no checkstyle rules are violated. Author: Tejas Patil <[email protected]> Closes apache#20041 from tejasapatil/followup_19257.
When one execution has multiple jobs, we need to append to the set of stages, not replace them on every job. Added unit test and ran existing tests on jenkins Author: Imran Rashid <[email protected]> Closes apache#20047 from squito/SPARK-22861.
What changes were proposed in this pull request? This PR contains documentation on the usage of Kubernetes scheduler in Spark 2.3, and a shell script to make it easier to build docker images required to use the integration. The changes detailed here are covered by apache#19717 and apache#19468 which have merged already. How was this patch tested? The script has been in use for releases on our fork. Rest is documentation. cc rxin mateiz (shepherd) k8s-big-data SIG members & contributors: foxish ash211 mccheah liyinan926 erikerlandson ssuchter varunkatta kimoonkim tnachen ifilonenko reviewers: vanzin felixcheung jiangxb1987 mridulm TODO: - [x] Add dockerfiles directory to built distribution. (apache#20007) - [x] Change references to docker to instead say "container" (apache#19995) - [x] Update configuration table. - [x] Modify spark.kubernetes.allocation.batch.delay to take time instead of int (apache#20032) Author: foxish <[email protected]> Closes apache#19946 from foxish/update-k8s-docs.
The code was ignoring SparkListenerLogStart, which was added somewhat recently to record the Spark version used to generate an event log. Author: Marcelo Vanzin <[email protected]> Closes apache#20049 from vanzin/SPARK-22854.
## What changes were proposed in this pull request? The PR introduces a new method `addImmutableStateIfNotExists ` to `CodeGenerator` to allow reusing and sharing the same global variable between different Expressions. This helps reducing the number of global variables needed, which is important to limit the impact on the constant pool. ## How was this patch tested? added UTs Author: Marco Gaido <[email protected]> Author: Marco Gaido <[email protected]> Closes apache#19940 from mgaido91/SPARK-22750.
…- LabeledPoint/VectorWithNorm/TreePoint ## What changes were proposed in this pull request? register following classes in Kryo: `org.apache.spark.mllib.regression.LabeledPoint` `org.apache.spark.mllib.clustering.VectorWithNorm` `org.apache.spark.ml.feature.LabeledPoint` `org.apache.spark.ml.tree.impl.TreePoint` `org.apache.spark.ml.tree.impl.BaggedPoint` seems also need to be registered, but I don't know how to do it in this safe way. WeichenXu123 cloud-fan ## How was this patch tested? added tests Author: Zheng RuiFeng <[email protected]> Closes apache#19950 from zhengruifeng/labeled_kryo.
## What changes were proposed in this pull request? The path was recently changed in apache#19946, but the dockerfile was not updated. This is a trivial 1 line fix. ## How was this patch tested? `./sbin/build-push-docker-images.sh -r spark-repo -t latest build` cc/ vanzin mridulm rxin jiangxb1987 liyinan926 Author: Anirudh Ramanathan <[email protected]> Author: foxish <[email protected]> Closes apache#20051 from foxish/patch-1.
…oder This behavior has confused some users, so lets clarify it. Author: Michael Armbrust <[email protected]> Closes apache#20048 from marmbrus/datasetAsDocs.
…seVersion. ## What changes were proposed in this pull request? Currently we check pandas version by capturing if `ImportError` for the specific imports is raised or not but we can compare `LooseVersion` of the version strings as the same as we're checking pyarrow version. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <[email protected]> Closes apache#20054 from ueshin/issues/SPARK-22874.
The checkstyle failures (in the Full Build test on this PR) seems unrelated to our code - maybe it's broken in upstream/master? |
rerun integration test please |
1 similar comment
rerun integration test please |
We are running the new integration test repo code. I modified other builds like unit tests to exclude the master branch. So going forward, only integration test will trigger from the master branch. |
|
The failed Jenkins build with "default" label is actually from the new integration test Jenkins job. I'll find a way to change the label. |
rerun integration test please |
1 similar comment
rerun integration test please |
The latest two Jenkins jobs ran the new integration tests. The "Make Distribution" job built a distro tarball off this PR. The "Integration Tests" job ran tests against the tarball. It failed because of config issue that I just fixed. |
@kimoonkim, we should see the make distribution and integration tests pass now? |
I am hoping the next runs will pass. Getting there. |
Ok. It seems the latest test failure is genuine. @liyinan926 Can you please take a look? Maybe your branch is outdated and need to merge apache/spark#20051 From http://spark-k8s-jenkins.pepperdata.org:8080/job/pr-spark-integration/5/:
|
Might be best to rebase onto upstream/master
…On Fri, Dec 22, 2017 at 3:37 PM, Kimoon Kim ***@***.***> wrote:
Ok. It seems the latest test failure is genuine. @liyinan926
<https://github.com/liyinan926> Can you please take a look? Maybe your
branch is outdated and need to merge apache/spark#20051
<apache#20051>
From http://spark-k8s-jenkins.pepperdata.org:8080/job/pr-
spark-integration/5/:
Discovery starting.
Discovery completed in 145 milliseconds.
Run starting. Expected test count is: 2
KubernetesSuite:
*** RUN ABORTED ***
com.spotify.docker.client.exceptions.DockerException:
ProgressMessage{id=null, status=null, stream=null, error=lstat
dockerfiles/spark-base/entrypoint.sh: no such file or directory,
progress=null, progressDetail=null}
at com.spotify.docker.client.LoggingBuildHandler.progress(
LoggingBuildHandler.java:33)
at com.spotify.docker.client.DefaultDockerClient.build(
DefaultDockerClient.java:1157)
at org.apache.spark.deploy.k8s.integrationtest.docker.
SparkDockerImageBuilder.buildImage(SparkDockerImageBuilder.scala:70)
at org.apache.spark.deploy.k8s.integrationtest.docker.
SparkDockerImageBuilder.buildSparkDockerImages(
SparkDockerImageBuilder.scala:64)
at org.apache.spark.deploy.k8s.integrationtest.backend.
minikube.MinikubeTestBackend.initialize(MinikubeTestBackend.scala:31)
at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(
KubernetesSuite.scala:42)
at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.
scala:187)
at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(
KubernetesSuite.scala:33)
at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.org
$scalatest$BeforeAndAfter$$super$run(KubernetesSuite.scala:33)
...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#582 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA3U55v1djkfJyfpnHEIeKpkVo0dJDpSks5tDD1CgaJpZM4RLYI3>
.
--
Anirudh Ramanathan
|
Rebased onto latest upstream/master. |
Integration test has passed now! |
rerun integration tests please |
1 similar comment
rerun integration tests please |
Closing as the upstream has been merged. |
This is the same PR as apache#19954, but against our fork for triggering integration tests.
@kimoonkim @foxish