Skip to content

Commit 4358de2

Browse files
committed
blog
1 parent 4b4bfae commit 4358de2

16 files changed

+854
-18
lines changed

README.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,30 @@ This work is published under [MIT][mit] License.
4141
[chirpy]: https://github.com/cotes2020/jekyll-theme-chirpy/
4242
[CD]: https://en.wikipedia.org/wiki/Continuous_deployment
4343
[mit]: https://github.com/cotes2020/chirpy-starter/blob/master/LICENSE
44+
45+
## Setting up the Environment
46+
47+
### Setting up Natively (Recommended for Unix-like OS)
48+
49+
For Unix-like systems, you can set up the environment natively for optimal performance, though you can also use Dev Containers as an alternative.
50+
51+
**Steps**:
52+
53+
1. Follow the [Jekyll installation guide](https://jekyllrb.com/docs/installation/) to install Jekyll and ensure [Git](https://git-scm.com/) is installed.
54+
2. Clone your repository to your local machine.
55+
3. If you forked the theme, install [Node.js][nodejs] and run `bash tools/init.sh` in the root directory to initialize the repository.
56+
4. Run command `bundle` in the root of your repository to install the dependencies.
57+
58+
## Usage
59+
60+
### Start the Jekyll Server
61+
62+
To run the site locally, use the following command:
63+
64+
```terminal
65+
$ bundle exec jekyll serve
66+
```
67+
68+
After a few seconds, the local server will be available at <http://127.0.0.1:4000>.
69+
70+
OKOKOKOKOK

_config.yml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,28 +14,28 @@ timezone: Asia/Shanghai
1414
# jekyll-seo-tag settings › https://github.com/jekyll/jekyll-seo-tag/blob/master/docs/usage.md
1515
# ↓ --------------------------
1616

17-
title: LinkCaau # the main title
17+
title: 纸妖的博客 # the main title
1818

1919
tagline: Break Boost and Beyond # it will display as the subtitle
2020

2121
description: >- # used by seo meta and the atom feed
22-
chaodongyue(LinkCaau) 的个人博客, 主要是一些技术性偏向的分享.
22+
纸妖(chaodongyue) 的个人博客, 主要是一些技术性偏向的分享.
2323
2424
# Fill in the protocol & hostname for your site.
2525
# E.g. 'https://username.github.io', note that it does not end with a '/'.
26-
url: "https://linkcaau.github.io"
26+
url: "https://chaodongyue.github.io"
2727

2828
github:
29-
username: LinkCaau # change to your GitHub username
29+
username: chaodongyue # change to your GitHub username
3030

3131
social:
3232
# Change to your full name.
3333
# It will be displayed as the default author of the posts and the copyright owner in the Footer
34-
name: LinkCaau
34+
name: 纸妖
3535
email: chaodongyue@163.com # change to your email address
3636
links:
3737
# The first element serves as the copyright owner's link
38-
- https://github.com/LinkCaau # change to your GitHub homepage
38+
- https://github.com/chaodongyue # change to your GitHub homepage
3939
# Uncomment below to add more social links
4040
# - https://www.facebook.com/username
4141
# - https://www.linkedin.com/in/username
@@ -115,7 +115,7 @@ comments:
115115
issue_term: # < url | pathname | title | ...>
116116
# Giscus options › https://giscus.app
117117
giscus:
118-
repo: LinkCaau/linkcaau.github.io
118+
repo: chaodongyue/chaodongyue.github.io
119119
repo_id:
120120
category:
121121
category_id:
Lines changed: 311 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,311 @@
1+
---
2+
title: Apache POI 以流(Stream)的方式下载Excel
3+
date: 2025-12-12 17:00:00 +0800
4+
categories: [Blogging, Java]
5+
tags: [java]
6+
---
7+
8+
> 文章写于2019年, 可能已过时或有出入
9+
{: .prompt-warning }
10+
11+
# Apache POI 以流(Stream)的方式下载Excel
12+
13+
源于一个很BT的需求:从数据库查出大量数据(百万级)导出成Excel(不讨论业务的正确性)。我们时想一边查数据一边以流(Stream)的方式输出到客户端。
14+
15+
## 前言
16+
17+
xlsx格式是 office 2007 开始使用的 Office Open XML 标准([WIKI](https://zh.wikipedia.org/wiki/Office_Open_XML)),xlsx 其实是一个压缩包,大家可以解压出来看到里面的内容。
18+
19+
Java开源导出Excel只有Apache POI这个选择。众所周知POI导出大量的数据会导致OOM。
20+
究其原因是从创建 Workbook(org.apache.poi.xssf.usermodel.XSSFWorkbook) 直到调用 Workbook#write() 之前在内存存活着大量的对象。
21+
谷歌一番POI官网提供了org.apache.poi.xssf.streaming.SXSSFWorkbook 来解决OOM的问题。官方旧的解决方式([Link](https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/usermodel/examples/BigGridDemo.java))也能使用,不过已经被集合到 SXSSFWorkbook 中。但官网是不提供一边查数据,一边以Stream的方式输出Excel。根据**01定律**,理论上是可以做到,最差也就手写0和1:)
22+
23+
## 例子
24+
根据官网文档,可以在看到 SXSSFWorkbook 其实是将数据刷到本地的硬盘上,实现了自动刷入和手动输入,
25+
最后 write() 的时候安全输出而不会导出OOM。
26+
27+
我们来看看官方例子怎么样
28+
```
29+
Workbook workbook = new SXSSFWorkbook();// 1
30+
Sheet sheet = workbook.createSheet();//2
31+
CellStyle cellStyle = workbook.createCellStyle();//4
32+
for (int i = 0; i < 60000; i++) {
33+
Row newRow = sheet.createRow(i);//3
34+
for (int j = 0; j < 100; j++) {
35+
newRow.createCell(j).setCellValue("test" + Math.random());
36+
newRow.setCllStyle(cellStyle);//3
37+
}
38+
}
39+
ByteArrayOutputStream os = new ByteArrayOutputStream();
40+
workbook.write(os);//4
41+
```
42+
43+
### 0x01 Workbook
44+
``` new SXSSFWorkbook() ``` 是创建一个刷新数据的窗口大小(rowAccessWindowSize)为100的 SXSSFWorkbook。在内存里只允许多少个行对象存在(下面会说明)。如果是由于列太多导致OOM,官方提供的方案是解决不了的^_^ ([技术指标](https://zh.wikipedia.org/wiki/Microsoft_Excel#%E6%8A%80%E6%9C%AF%E6%8C%87%E6%A0%87)) 可以看出SXSSFWorkbook默认是包装了一下XSSFWorkbook,_wb 实际指向的是 XSSFWorkbook。
45+
```
46+
public static final int DEFAULT_WINDOW_SIZE = 100;
47+
48+
public SXSSFWorkbook(){
49+
this(null /*workbook*/);
50+
}
51+
52+
public SXSSFWorkbook(XSSFWorkbook workbook){
53+
this(workbook, DEFAULT_WINDOW_SIZE);
54+
}
55+
56+
//最终调的是这个方法
57+
public SXSSFWorkbook(XSSFWorkbook workbook, int rowAccessWindowSize, boolean compressTmpFiles, boolean useSharedStringsTable){
58+
setRandomAccessWindowSize(rowAccessWindowSize);
59+
setCompressTempFiles(compressTmpFiles);
60+
if (workbook == null) {
61+
_wb=new XSSFWorkbook();//实际存的是 XSSFWorkbook
62+
_sharedStringSource = useSharedStringsTable ? _wb.getSharedStringSource() : null;
63+
} else {
64+
_wb=workbook;
65+
_sharedStringSource = useSharedStringsTable ? _wb.getSharedStringSource() : null;
66+
for ( Sheet sheet : _wb ) {
67+
createAndRegisterSXSSFSheet( (XSSFSheet)sheet );
68+
}
69+
}
70+
}
71+
```
72+
73+
### 0x02 Sheet
74+
``` workbook.createSheet() ```创建一个 SXSSFSheet 实际也是包装了一个 XSSFSheet,SXSSFSheet 很多方法其实是调用了 XSSFSheet的。
75+
```
76+
public SXSSFSheet createSheet()
77+
{
78+
return createAndRegisterSXSSFSheet(_wb.createSheet());
79+
}
80+
81+
SXSSFSheet createAndRegisterSXSSFSheet(XSSFSheet xSheet)
82+
{
83+
final SXSSFSheet sxSheet;
84+
try
85+
{
86+
sxSheet=new SXSSFSheet(this,xSheet);
87+
}
88+
catch (IOException ioe)
89+
{
90+
throw new RuntimeException(ioe);
91+
}
92+
registerSheetMapping(sxSheet,xSheet);
93+
return sxSheet;
94+
}
95+
```
96+
```new SXSSFSheet(this,xSheet)```会创建一个```SheetDataWriter```对象,从名称可以看到是一个Sheet的数据输出对象,追踪到里面可以得知创建一个```SheetDataWriter```对象实际时创建了一个前缀为poi-sxssf-sheet的xml文件和这文件对应的java.io.Writer。xml文件的路径是在 %java.io.tmpdir%/poifiles 下
97+
98+
```
99+
public SheetDataWriter() throws IOException {
100+
_fd = createTempFile();
101+
_out = createWriter(_fd);
102+
}
103+
104+
public File createTempFile() throws IOException {
105+
return TempFile.createTempFile("poi-sxssf-sheet", ".xml");
106+
}
107+
```
108+
### 0x03 Row
109+
createRow利用上面的sheet来新建行对象保存到内存中,如果在内存的对象超出 _randomAccessWindowSize 就用 SheetDataWriter刷新到硬盘上的临时xml文件里。至于 Cell 也是保存在 Row 对象的 _cells 字段里面。在刷新数据到硬盘上是使用了上面创建的 SheetDataWriter 对象,```writeRow(int,SXSSFRow)```构建成行的xml格式将数据追加到临时文件最后。
110+
```
111+
@Override
112+
public SXSSFRow createRow(int rownum)
113+
{
114+
int maxrow = SpreadsheetVersion.EXCEL2007.getLastRowIndex();//判断最大行数
115+
if (rownum < 0 || rownum > maxrow) {
116+
throw new IllegalArgumentException("Invalid row number (" + rownum
117+
+ ") outside allowable range (0.." + maxrow + ")");
118+
}
119+
120+
// attempt to overwrite a row that is already flushed to disk
121+
// 行数必须大于已经刷到硬盘的行数
122+
if(rownum <= _writer.getLastFlushedRow() ) {
123+
throw new IllegalArgumentException(
124+
"Attempting to write a row["+rownum+"] " +
125+
"in the range [0," + _writer.getLastFlushedRow() + "] that is already written to disk.");
126+
}
127+
128+
// attempt to overwrite a existing row in the input template
129+
// 行数必须大于模板的行数
130+
if(_sh.getPhysicalNumberOfRows() > 0 && rownum <= _sh.getLastRowNum() ) {
131+
throw new IllegalArgumentException(
132+
"Attempting to write a row["+rownum+"] " +
133+
"in the range [0," + _sh.getLastRowNum() + "] that is already written to disk.");
134+
}
135+
136+
SXSSFRow newRow=new SXSSFRow(this);//SXSSFRow 保存当前的 SXSSFSheet 对象
137+
_rows.put(rownum,newRow);// 保存行号和行对象,可以看出如何数据量大就会导致OOM
138+
allFlushed = false;
139+
// 在内存的行数是否大于刷新的窗口大小,大于就刷到硬盘的临时文件上
140+
if(_randomAccessWindowSize>=0&&_rows.size()>_randomAccessWindowSize)
141+
{
142+
try
143+
{
144+
flushRows(_randomAccessWindowSize);
145+
}
146+
catch (IOException ioe)
147+
{
148+
throw new RuntimeException(ioe);
149+
}
150+
}
151+
return newRow;
152+
}
153+
154+
private void flushOneRow() throws IOException
155+
{
156+
Integer firstRowNum = _rows.firstKey();
157+
if (firstRowNum!=null) {
158+
int rowIndex = firstRowNum.intValue();
159+
SXSSFRow row = _rows.get(firstRowNum);
160+
// Update the best fit column widths for auto-sizing just before the rows are flushed
161+
_autoSizeColumnTracker.updateColumnWidths(row);
162+
_writer.writeRow(rowIndex, row);// 使用了上面创建的 SheetDataWriter 来写到硬盘上
163+
_rows.remove(firstRowNum);
164+
lastFlushedRowNumber = rowIndex;
165+
}
166+
}
167+
168+
```
169+
170+
### 0x04 WorkBook#write(OutputStream) 和其他
171+
CellStyle等通过 SXSSFWorkbook 创建的对象底层都是通过 XSSFWorkbook 创建。这些对象也是停留在内存中,并不会写到硬盘上。最终输出时
172+
先把内存的 row 全部刷到硬盘上,再把Excel模板刷到硬盘上。这个模板其实就是一个包含style等但不包含数据的excel文件,可以看到这excel文件时通过zip格式写到硬盘上,所以又有了开头说的可以解压看excel里面的内容。
173+
174+
```
175+
public void write(OutputStream stream) throws IOException
176+
{
177+
flushSheets();//把内存中的所有数据刷到硬盘上
178+
179+
//Save the template
180+
File tmplFile = TempFile.createTempFile("poi-sxssf-template", ".xlsx");
181+
boolean deleted;
182+
try {
183+
FileOutputStream os = new FileOutputStream(tmplFile);// 保存 XSSFWorkbook 模板
184+
try {
185+
_wb.write(os);
186+
} finally {
187+
os.close();
188+
}
189+
190+
//Substitute the template entries with the generated sheet data files
191+
final ZipEntrySource source = new ZipFileZipEntrySource(new ZipFile(tmplFile));
192+
injectData(source, stream);//往zip文件里面注入数据
193+
} finally {
194+
deleted = tmplFile.delete();
195+
}
196+
....省略
197+
}
198+
199+
protected void injectData(ZipEntrySource zipEntrySource, OutputStream out) throws IOException {
200+
....省略
201+
// See bug 56557, we should not inject data into the special ChartSheets
202+
if(xSheet!=null && !(xSheet instanceof XSSFChartSheet)) {//判断是否 sheet, 是读取xml文件输出,否则直接输出
203+
SXSSFSheet sxSheet=getSXSSFSheet(xSheet);
204+
InputStream xis = sxSheet.getWorksheetXMLInputStream();
205+
try {
206+
copyStreamAndInjectWorksheet(is,zos,xis);// 读取 xml 文件再输出
207+
} finally {
208+
xis.close();
209+
}
210+
} else {
211+
IOUtils.copy(is, zos);
212+
}
213+
....省略
214+
}
215+
216+
private static void copyStreamAndInjectWorksheet(InputStream in, OutputStream out, InputStream worksheetData) throws IOException {
217+
....省略
218+
//Copy the worksheet data to "out".
219+
IOUtils.copy(worksheetData,out);// 将 xml 文件的内容复制到输出流中,从而输出到客户端
220+
221+
outWriter.write("</sheetData>");
222+
outWriter.flush();
223+
//Copy the rest of "in" to "out".
224+
while(((c=inReader.read())!=-1)) {
225+
outWriter.write(c);
226+
}
227+
outWriter.flush();
228+
}
229+
230+
```
231+
232+
### 0x05 poi 解决 OOM 总结
233+
poi 把最终 excel 拆分成模板文件和数据分开保存,数据保存到硬盘上,模板(包含Style、字体等信息)保存在内存上。输出时把模板生成为 xlsx 文件再以 zip 格式读回出来重新输出给客服端,如果读到 sheet 的文件就替换成 xml 数据文件输出。由于 zip 只是一个打包,并没有压缩混乱了整个文件,可以看作一个把一个文件夹的内容输出。
234+
235+
236+
### 0x06 修改
237+
其实修改方式有很多,可以由模板到数据把整个输出都处理了,这是最完美的做法。但这方法需要修改的地方太多,需要了解的内容也太多。由于时间的关系,所以我就采用继承 SXSSFWorkbook 在注入数据时不从文件中读取,改为即时读取业务数据即时生成 xml 数据文件。把生成 xml 文件时的 SheetDataWriter#_out 通过反射修改成指 OutputStream 即可解决把输出重定向。 这样可以复用 poi 原生的内容而又不用改动太大。由于```copyStreamAndInjectWorksheet```是私有方法不能重写,那只能复制源码并重写```injectData```方法;```injectData```也调了私有方法,可以用反射来解决。
238+
239+
```
240+
public class StreamSXSSFWorkbook extends SXSSFWorkbook {
241+
242+
/**
243+
* 消费数据,生成Row
244+
*/
245+
private Consumer<Sheet> sheetConsumer;
246+
247+
248+
249+
private static void copyStreamAndInjectWorksheet(InputStream in, OutputStream out, InputStream worksheetData) throws IOException {
250+
....省略
251+
//Copy the worksheet data to "out".
252+
//IOUtils.copy(worksheetData,out);// 将 xml 文件的内容复制到输出流中,从而输出到客户端
253+
//将生成 xml 数据文件的输出流改为输出到客户端的输出流
254+
try {
255+
Field writerField = findField(sheet.getClass(), "_writer");
256+
writerField.setAccessible(true);
257+
SheetDataWriter sheetDataWriter = (SheetDataWriter) writerField.get(sheet);
258+
259+
Field outField = findField(sheetDataWriter.getClass(), "_out");
260+
outField.setAccessible(true);
261+
outField.set(sheetDataWriter, outWriter);
262+
} catch (IllegalAccessException e) {
263+
throw new RuntimeException(e);
264+
}
265+
consumer.accept(sheet);//生成并输出 xml 数据文件
266+
sheet.flushRows();//刷新内存中(rowAccessWindowSize控制的)剩下的数据
267+
268+
outWriter.write("</sheetData>");
269+
outWriter.flush();
270+
//Copy the rest of "in" to "out".
271+
while (((c = inReader.read()) != -1)) {
272+
outWriter.write(c);
273+
}
274+
outWriter.flush();
275+
}
276+
277+
public void setSheetConsumer(Consumer<Sheet> sheetConsumer) {
278+
this.sheetConsumer = sheetConsumer;
279+
}
280+
}
281+
```
282+
使用方法
283+
```
284+
StreamSXSSFWorkbook wb = new StreamSXSSFWorkbook(1000);
285+
List<CellStyle> cellStyles = initCellStyle(wb);//必须先创建
286+
wb.setSheetConsumer(sheet -> {
287+
List<String> ls = data();
288+
289+
Iterator<String> it = ls.iterator();
290+
int i = 0;
291+
while (it.hasNext()){
292+
String val = it.next();
293+
294+
CellStyle style = cellStyles.get(i);
295+
Row row = sheet.createRow(i);
296+
Cell cell = row.createCell(0);
297+
cell.setCellStyle(style);
298+
cell.setCellValue(val);
299+
300+
it.remove();
301+
i++;
302+
}
303+
});
304+
wb.write(out);
305+
```
306+
307+
### 0x07 限制、优化、建议
308+
1. 我使用的版本时 poi 3.17,修改的```copyStreamAndInjectWorksheet```方法和```injectData```方法依赖于源码,升级版本时源码修改了也需要做相应的修改
309+
2. 由于生成 excel 时是先生成输出模板在注入数据,所以CellStyle等通过Workbook创建的对象在setSheetConsumer里生成是没有效果的;但可以在setSheetConsumer之前生成,再传入里面。
310+
3. 上面的方法还是会生成临时文件,只不过临时文件是空的,可以通过重写```SheetDataWriter```来优化。同样道理也是可以不用成模板的临时文件再读出输出,但这样子修改成本比较大,不过也是最完美。
311+
4. 建议输出大量数据时做到分页查询再输入,并且业务数据以pop的方式读完就删除,让JVM尽快回收。

_posts/2025-12-12-hello-blog.md

Lines changed: 0 additions & 9 deletions
This file was deleted.

0 commit comments

Comments
 (0)