Background

I work with a lot of csv data, most of which is stored in s3, and wanted an easy way to get a small preview of files before downloading the full thing. Unfortunately, the aws cli does not make it easy to partially read files into a buffer, so I created this tool to make it a lot easier. It works similar to the linux head command, but does not have the exact same interface, so be warned.

Installation

go build ./

Then copy the s3head binary to your bin directory of choice.

Usage

s3head is indifferent to the type of file you pass in. It simply iterates over the lines in the file.

AWS Authentication

You must obviously be authenticated to AWS to use this command. s3head defers AWS authentication to the steps taken by the NewSession object provided by aws-sdk-go/aws/session.

Default behavior

# Prints the first five lines in the file

s3head s3://my-bucket/path/to/my/key

Specify number of lines

# Prints the first 10 lines of the file

s3head -n 10 s3://my-bucket/path/to/my/key

Grab the entire file

s3head -a s3://my-bucket/path/to/my/key

Pipe the output

# pipes output to xsv
# https://github.com/BurntSushi/xsv

s3head -n 1000 s3://my-bucket/path/to/my/csv/file \
    | xsv select "firstname,lastname" \
    | xsv sample 10

# pipes output to jq
# https://github.com/jqlang/jq

s3head -a s3://my-bucket/path/to/my/json/file \
    | jq .my_key

Save to a file

s3head -a s3://my-bucket/path/to/my/csv/file > myfile.csv

Automatic GZIP Decompression

s3head -n 1000 s3://my-bucket/path/to/my/file.csv.gz \
    | xsv headers

Why not use the the AWS CLI `s3api get-object` command instead?

For some reason, attempting to pipe the stream from aws s3-api get-object consistently results in a Broken Pipe error, which doesnt look very clean
Working with gzipped data is a lot more concise with s3head:

aws s3api get-object --bucket my-bucket --key path/to/my/key.gz /dev/stdout \
    | gunzip -c \
    | head -n 2

versus

s3head -n 2 s3://my-bucket/path/to/my/key.gz

Similar Projects

The following projects seem to attempt to solve a similar problem as s3head. Why use s3head over these other solutions? Perhaps you like the api better, or perhaps it feels faster because it's written in golang and feels more "modern".

s3streamcat

s3curl

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
file		file
utils		utils
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
run_tests.sh		run_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Background

Installation

Usage

AWS Authentication

Default behavior

Specify number of lines

Grab the entire file

Pipe the output

Save to a file

Automatic GZIP Decompression

Why not use the the AWS CLI `s3api get-object` command instead?

Similar Projects

About

Uh oh!

Releases

Packages

Uh oh!

Languages

dbragdon1/s3head

Folders and files

Latest commit

History

Repository files navigation

Background

Installation

Usage

AWS Authentication

Default behavior

Specify number of lines

Grab the entire file

Pipe the output

Save to a file

Automatic GZIP Decompression

Why not use the the AWS CLI s3api get-object command instead?

Similar Projects

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Why not use the the AWS CLI `s3api get-object` command instead?

Packages