Skip to content

Commit d8a7a5e

Browse files
boneanxsfacebook-github-bot
authored andcommitted
fix(hdfs): Nullify hdfsFile_ before checking success status (facebookincubator#14666)
Summary: driver_->CloseFile can fail if the task is interrupted. In that case, hdfsFile_ is not reset to nullptr. Later, when the HdfsWriteFile destructor runs, it invokes close() again, resulting in a double close. This leads to a fatal JNI error: ```java 25/08/25 08:12:06 INFO [dispatcher-Executor] Executor: Executor is trying to kill task 43.2 in stage 11.0 (TID 617), reason: another attempt succeeded HdfsWriteFile.cpp:57, Function:close, Expression: success == 0 (-1 vs. 0) Failed to close hdfs file: IOException: Failed to shutdown streamer, Source: RUNTIME, ErrorCode: INVALID_STATE ``` Eventually, the JVM crashes with a segmentation fault during a JNI call to FSDataOutputStream.close(): ```java # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007fa2ec7e0c48, pid=174755, tid=0x00007fa220262700 # # JRE version: OpenJDK Runtime Environment (8.0_252-b09) (build 1.8.0_252-b09) # Java VM: OpenJDK 64-Bit Server VM (25.252-b09 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x68fc48] jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x48 # ``` Stack trace excerpt showing the duplicate close() call path: ```c++ facebookincubator#8 invokeMethodOnJclass(..., "org/apache/hadoop/fs/FSDataOutputStream", "close", "()V") facebookincubator#10 hdfsCloseFile(...) facebookincubator#11 facebook::velox::HdfsWriteFile::close() facebookincubator#12 facebook::velox::HdfsWriteFile::~HdfsWriteFile() facebookincubator#14 facebook::velox::dwio::common::WriteFileSink::~WriteFileSink() facebookincubator#15 facebook::velox::parquet::Writer::abort() ``` Pull Request resolved: facebookincubator#14666 Reviewed By: gggrace14 Differential Revision: D82880508 Pulled By: xiaoxmeng fbshipit-source-id: c49467744b7784c3373faf61e8a46058f5243caf
1 parent a029fcf commit d8a7a5e

File tree

2 files changed

+35
-1
lines changed

2 files changed

+35
-1
lines changed

velox/connectors/hive/storage_adapters/hdfs/HdfsWriteFile.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,12 +54,14 @@ HdfsWriteFile::~HdfsWriteFile() {
5454

5555
void HdfsWriteFile::close() {
5656
int success = driver_->CloseFile(hdfsClient_, hdfsFile_);
57+
common::testutil::TestValue::adjust(
58+
"facebook::velox::connectors::hive::HdfsWriteFile::close", &success);
59+
hdfsFile_ = nullptr;
5760
VELOX_CHECK_EQ(
5861
success,
5962
0,
6063
"Failed to close hdfs file: {}",
6164
driver_->GetLastExceptionRootCause());
62-
hdfsFile_ = nullptr;
6365
}
6466

6567
void HdfsWriteFile::flush() {

velox/connectors/hive/storage_adapters/hdfs/tests/HdfsFileSystemTest.cpp

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
#include "gtest/gtest.h"
2222
#include "velox/common/base/Exceptions.h"
2323
#include "velox/common/base/tests/GTestUtils.h"
24+
#include "velox/common/testutil/TestValue.h"
2425
#include "velox/connectors/hive/storage_adapters/hdfs/HdfsReadFile.h"
2526
#include "velox/connectors/hive/storage_adapters/hdfs/RegisterHdfsFileSystem.h"
2627
#include "velox/connectors/hive/storage_adapters/hdfs/tests/HdfsMiniCluster.h"
@@ -525,3 +526,34 @@ TEST_F(HdfsFileSystemTest, readFailures) {
525526
std::string(miniCluster->nameNodePort()));
526527
verifyFailures(driver, hdfs);
527528
}
529+
530+
DEBUG_ONLY_TEST_F(HdfsFileSystemTest, writeFilePreventsDoubleClose) {
531+
common::testutil::TestValue::enable();
532+
533+
int closeCallCount = 0;
534+
535+
SCOPED_TESTVALUE_SET(
536+
"facebook::velox::connectors::hive::HdfsWriteFile::close",
537+
std::function<void(int*)>([&closeCallCount](int* success) {
538+
++closeCallCount;
539+
if (closeCallCount == 1) {
540+
*success = -1;
541+
}
542+
}));
543+
544+
auto writeFile = openFileForWrite("/test_double_close.txt");
545+
546+
writeFile->append("test data");
547+
writeFile->flush();
548+
549+
VELOX_ASSERT_THROW(writeFile->close(), "Failed to close hdfs file:");
550+
551+
EXPECT_EQ(closeCallCount, 1);
552+
553+
// Destructor should not call close() again because hdfsFile_ is nullptr
554+
// The closeCallCount should remain 1.
555+
writeFile.reset();
556+
EXPECT_EQ(closeCallCount, 1);
557+
558+
common::testutil::TestValue::disable();
559+
}

0 commit comments

Comments
 (0)