I work with a lot of csv data, most of which is stored in s3, and wanted an easy way to get a small preview of files before downloading the full thing. Unfortunately, the aws cli does not make it easy to partially read files into a buffer, so I created this tool to make it a lot easier. It works similar to the linux head command, but does not have the exact same interface, so be warned.
go build ./Then copy the s3head binary to your bin directory of choice.
s3head is indifferent to the type of file you pass in. It simply iterates over the lines in the file.
You must obviously be authenticated to AWS to use this command. s3head defers AWS authentication to the steps taken by the NewSession object provided by aws-sdk-go/aws/session.
# Prints the first five lines in the file
s3head s3://my-bucket/path/to/my/key# Prints the first 10 lines of the file
s3head -n 10 s3://my-bucket/path/to/my/keys3head -a s3://my-bucket/path/to/my/key
# pipes output to xsv
# https://github.com/BurntSushi/xsv
s3head -n 1000 s3://my-bucket/path/to/my/csv/file \
| xsv select "firstname,lastname" \
| xsv sample 10# pipes output to jq
# https://github.com/jqlang/jq
s3head -a s3://my-bucket/path/to/my/json/file \
| jq .my_keys3head -a s3://my-bucket/path/to/my/csv/file > myfile.csvs3head -n 1000 s3://my-bucket/path/to/my/file.csv.gz \
| xsv headers- For some reason, attempting to pipe the stream from
aws s3-api get-objectconsistently results in a Broken Pipe error, which doesnt look very clean - Working with gzipped data is a lot more concise with
s3head:
aws s3api get-object --bucket my-bucket --key path/to/my/key.gz /dev/stdout \
| gunzip -c \
| head -n 2versus
s3head -n 2 s3://my-bucket/path/to/my/key.gzThe following projects seem to attempt to solve a similar problem as s3head. Why use s3head over these other solutions? Perhaps you like the api better, or perhaps it feels faster because it's written in golang and feels more "modern".