BigQuery has following Quota Policy.
So, It's better to split output file each 4GB.
| File Type |
Compressed |
Uncompressed |
| CSV |
4 GB |
With new-lines in strings: 4 GB Without new-lines in strings: 5 TB |
| JSON |
4 GB |
5TB |
Problems
- Have to split newline(CRLF/LF/CR) at EOL, not only filesize.
- Split before output beforehand is better way than split output file, Because Embulk run multiple tasks with multiple CPU cores.