diff --git a/README.md b/README.md index ae17454..de83f3f 100644 --- a/README.md +++ b/README.md @@ -160,6 +160,11 @@ A basic Stroom index designed for `event-logging` XML. | [v1.0](https://github.com/gchq/stroom-content/releases/tag/example-index-v1.0) | No | No | Y | +### Other Content + +* _Proxy_ + * [squidplus-proxy](./source/proxy/squidplus-proxy/README.md) `Agent and Stroom content` + ## Building the content packs Each content pack is defined as a directory within _stroom-content-source_ with the name of content pack being the name of the directory. diff --git a/source/proxy/squidplus-proxy/CHANGELOG.md b/source/proxy/squidplus-proxy/CHANGELOG.md new file mode 100644 index 0000000..696d90d --- /dev/null +++ b/source/proxy/squidplus-proxy/CHANGELOG.md @@ -0,0 +1,22 @@ +# Change Log + +All notable changes to this content pack will be documented in this file. + +The format is based on [Keep a Changelog](http://keepachangelog.com/) +and this project adheres to [Semantic Versioning](http://semver.org/). + +## [Unreleased] + +### Added + +### Changed + +### Removed + +## [squidplus-proxy-v1.0] + +Initial version. + + +[Unreleased]: https://github.com/gchq/stroom-content/compare/squidplus-proxy-v1.0...HEAD +[squidplus-proxy-v1.0]: https://github.com/gchq/stroom-content/compare/squidplus-proxy-v1.0...squidplus-proxy-v1.0 diff --git a/source/proxy/squidplus-proxy/README.md b/source/proxy/squidplus-proxy/README.md new file mode 100644 index 0000000..ccf018e --- /dev/null +++ b/source/proxy/squidplus-proxy/README.md @@ -0,0 +1,54 @@ +# _squidplus-proxy_ Content Pack + +## Summary + +The _squidplus-proxy_ Content Pack provides both client artefacts that acquire then post Squid access logs to a Stroom instance and Stroom content artefacts that will normalise the Squid access logs into the Stroom [`event-logging-schema`](https://github.com/gchq/event-logging-schema) format. + +This package does not use the standard Squid access log format, but a bespoke format called _squidplus_. This format records additional information such as port numbers, complete request and response headers, additional status information, data transfer sizes and next hop host information. + +Client deployment information can be found in the supplied [README](clientArtefacts/README.md) file. + + +## Stroom Contents + +The following represents the folder structure and content that will be imported in to Stroom with this content pack. + +* _Event Sources/Proxy/Squid-Plus-XML_ + * **Squid-Plus-XML-V1.0-EVENTS** `Feed` + + The feed used to store and process Squid Proxy events using the enriched SquidPlus Proxy XML format + + * **Squid-Plus-XML-V1.0-EVENTS** `Xslt` + + The xslt translation to convert the SquidPlus Proxy XML format into type XML. + + * **Squid-Plus-XML-V1.0-EVENTS** `Pipeline` + + The pipeline to process SquidPlus Proxy XML format into type XML. + +### Dependancies + +| Content pack | Version | Notes | +|:------------ |:------- |:----- | +| [`template-pipelines` Content Pack](../../../template-pipelines/README.md) | [v0.3](https://github.com/gchq/stroom-content/releases/tag/template-pipelines-v0.3) | Content pack element is the Event Data (XML) Pipeline | +| [`event-logging-xml-schema` Content Pack](../../../event-logging-xml-schema/README.md) | [v0.3](https://github.com/gchq/stroom-content/releases/tag/event-logging-xml-schema-v3.2.3) | Content pack element is the Event Logging Schema | + +## Client Contents + +The client artefacts are + +* **README.md** `Document` + + Basic documentation to configure and deploy the SquidPlus logging capability on a Linux Squid server. + +* **squidplusXML.pl** `Script - Perl` + + Perl script that ingests squidplus format Squid logs, correcting for possible errant log lines (due to large reqest/response header values), resolves and adds fully qualified domain names from IP addresses and converts to a simple XML format. + +* **squid_stroom_feeder.sh** `Script - Bash` + + Bash script that orchestrates the rolling over of the Squid logs, runs the **squidplusXML.pl** script then posts the resultant output to the appropriate feed within Stroom. + +## Documentation Contents + +There are no separate documentation artfacts. diff --git a/source/proxy/squidplus-proxy/build.gradle b/source/proxy/squidplus-proxy/build.gradle new file mode 100644 index 0000000..9cf3f50 --- /dev/null +++ b/source/proxy/squidplus-proxy/build.gradle @@ -0,0 +1,7 @@ +//squidplus-proxy + +dependencies { + compileSource project(path: ':event-logging-xml-schema', configuration: 'distConfig') + compileSource project(path: ':template-pipelines', configuration: 'distConfig') +} + diff --git a/source/proxy/squidplus-proxy/clientArtefacts/README.md b/source/proxy/squidplus-proxy/clientArtefacts/README.md new file mode 100644 index 0000000..32a937c --- /dev/null +++ b/source/proxy/squidplus-proxy/clientArtefacts/README.md @@ -0,0 +1,85 @@ +# Synopsis +The script `squid_stroom_feeder.sh` and it's supporting perl script, `squidplusXML.pl`, is designed to run from a crontab entry that periodically collects and enriches events from an appropriately configured single Squid Proxy service and posts the enriched events to an instance of Stroom. It is expected that + + - your squid deployment has been set up to generate SquidPlus format squid logs (see later) + - Stroom has been configured to accept streams of events in the Squid-Plus-XML-V1.0-EVENTS feed. + +If you need to deploy multiple Squid Proxy services on the one system then you WILL need to modify the `squid_stroom_feeder.sh` and Squid configurations to cater for the multiple instances. +NOTE: Currently the SquidPlus format captures both 'original receive request header' (%>h) and 'reply header' (%a/%>p %la/%>lp %Ss/%>Hs/%st/%>sh %mt %rm "%ru" "%un" %Sh "%>h" "%> /var/log/squid/stroom_squid_post.log +``` + diff --git a/source/proxy/squidplus-proxy/clientArtefacts/squid_stroom_feeder.sh b/source/proxy/squidplus-proxy/clientArtefacts/squid_stroom_feeder.sh new file mode 100644 index 0000000..8d6f823 --- /dev/null +++ b/source/proxy/squidplus-proxy/clientArtefacts/squid_stroom_feeder.sh @@ -0,0 +1,452 @@ + +# Release 1.3 - 20201229 Burn Alting +# - Modify gain_iana_timezone to sed out not just zoneinfo but zoneinfo/(posix|right|leaps) +# Release 1.2 - 20200515 Burn Alting +# - Support multiple URL destinations +# Release 1.1 - 20190630 Burn Alting +# - Gain host's Canonical Timezone TZ database name (Australia/Sydney, Europe/London, etc +# - https://www.iana.org/time-zones) and pass in post +# Release 1.0 - 20170623 Burn Alting - burn@swtf.dyndns.org +# - Initial Release + +# This script +# - on start up delays for a random period of time before proceeding. This is intended to +# inject random transmission load across a network of many systems generating audit. +# - calls '/usr/sbin/squid -k rotate' to cause squid to rotate it's log files +# - processes all rotated logs via the supporting squidplusXML.pl perl script leaving files in a queue directory +# - concatenate all raw logs into /var/log/squid/access.log +# - compresses then attempts to post the logs from the queue directory and removes them from the queue directory on successful post +# If multiple URLs are provided then each URL is applied in turn. Consideration should be given to reducing the curl connection +# timeout, C_TMO, if multiple URLs are provide + +# +# Note this script will need to change if it is to support multiple Squid instances. + + +# USAGE: +# stroom_feeder.sh [-n] +# -n prevents the random sleep prior to processing +# +Usage="Usage: `basename $0` [-n]" + +Arg0=`basename $0` +LCK_FILE=/tmp/$Arg0.lck # Note safe if multiple scripts execute at same time +THIS_PID=`echo $$` + +# We should normally sleep before processing data +NoSleep=0 + +# Check args +while getopts "n" opt; do + case $opt in + n) + NoSleep=1 + ;; + \?) + echo "$0: Invalid option -$OPTARG" + echo $Usage + exit 1 + ;; + esac +done + +# SYSTEM - Name of System +SYSTEM="My Squid Service" + +# ENVIRONMENT - Application environment +# Can be Production, QualityAssurance or Development +ENVIRONMENT="Production" + +# URL - URL for posting gzip'd audit log files +# +# This should NOT change without consultation with Audit Authority +URL=https://stroomp00.strmdev00.org/stroom/datafeed + +# Split if we have multiple URLS (comma separated) into the array urls and note it's size +IFS="," read -ra urls <<< "${URL}" +urls_idx=0 +urls_mod=${#urls[@]} + +# mySecZone - Security zone if pertinant +# +# We set this typically externally to the base script script source +mySecZone="none" + +# VERSION - The version of the log source +# +# This is to allow one to distinguish between different versions of the capability +# generating logs. If you have strong version control on the logging element of +# your application, you can use your release of the installed utility version. +# Samples are +# Basic extraction from a rpm package +# VERSION=`rpm -q httpd` +#UBUNTU VERSION=`dpkg --status squid | awk '{if ($1 == "Version:") print $2;}'` +VERSION=`rpm -q squid` + + +# FEED_NAME - Name of Stroom feed (asssigned by Audit Authority) +# +# This is the Stroom feed name we post the collected audit events to. +FEED_NAME="Squid-Plus-XML-V1.0-EVENTS" + +# FAILED_RETENTION - Retention period to hold logs that failed to transmit (days) +# +# This period, in days, is to allow a log source to temporarily maintain local copies +# of failed to transmit logs. +FAILED_RETENTION=90 + +# FAILED_MAX - Specify a storage limit on logs that failed to transmit (512-byte blocks) +# +# As well as a retention period for logs that failed to transmit, we also +# limit the size of this archive in terms byte. +# The value is in 512-byte blocks rather than bytes +# For example, 1GB is +# 1 GiB = 1 * 1024 * 2048 = 2097152 +# 8 GiB = 8 * 1024 * 2048 = 16777216 +FAILED_MAX=16777216 + +# MAX_SLEEP - Time to delay the processing and transmission of logs +# +# To avoid audit logs being transmitted from the estate at the same time, we will +# delay a random number of seconds up to this maximum before processing and +# +# This value should NOT be changed with permission from Audit Authority. It should +# also be the periodicity of the calling of the feeding script. That is, cron +# should call the feeding script every MAX_SLEEP seconds +MAX_SLEEP=580 + +# C_TMO - Maximum time in seconds to allow the connection to the server to take +# Consider changing this value if multiple URLS are provided +C_TMO=37 + +# M_TMO - Maximum time in seconds to allow the whole operation to take +M_TMO=1200 + +# STROOM_LOG_SOURCE - Source location of logs +# STROOM_LOG_QUEUED - Directory to queue logs ready for transmission +# PrimaryLog - Filename of main log file +STROOM_LOG_SOURCE=/var/log/squid/squidCurrent +STROOM_LOG_QUEUED=/var/log/squid/squidlogQueue +PrimaryLog=/var/log/squid/access.log + +# ROUTINES: + +# clean_store() +# Args: +# $1 - root - the root of the archive directory to clean +# $2 - retention - the retention period in days before archiving +# $3 - maxsize - the maximum size in (512-byte) blocks allowed in archive +# +# Ensure any local archives of logs are limited in size and retention period +clean_store() +{ + if [ $# -ne 3 ] ; then + echo "$Arg0: Not enough args calling clean_archive()" + return + fi + root=$1 + retention=$2 + maxsize=$3 + + # Just to be paranoid + if [ ${root} = "/" ]; then + echo "$Arg0: Cannot clean_archive root filesystem" + return + fi + + # We first delete files older than the retention period + find ${root} -type f -mtime +${retention} -exec rm -f {} \; + + # First cd to ${root} so we don't need shell expansion on + # the ls command below. + myloc=`pwd` + cd ${root} + # We next delete based on the max size for this store + s=`du -s --block-size=512 . | cut -f1` + while [ ${s} -gt ${maxsize} ]; do + ls -t | tail -5 | xargs rm -f + s=`du -s --block-size=512 . | cut -f1` + done + cd ${myloc} + return +} + +# logmsg() +# Args: +# $* - arguements to echo +# +# Print a message prefixed with a date and the program name +logmsg() { + NOW=`date +"%FT%T.000%:z"` + echo "${NOW} ${Arg0} `hostname`: $*" +} + +# stroom_get_lock() +# Args: +# none +# +# Obtain a lock to prevent duplicate execution +stroom_get_lock() { + + if [ -f "${LCK_FILE}" ]; then + MYPID=`head -n 1 "${LCK_FILE}"` + TEST_RUNNING=`ps -p ${MYPID} | grep ${MYPID}` + + if [ -z "${TEST_RUNNING}" ]; then + logmsg "Obtained lock for ${THIS_PID}" + echo "${THIS_PID}" > "${LCK_FILE}" + else + logmsg "Sorry ${Arg0} is already running[${MYPID}]" + # If the lock file is over thee hours old remove it. Basically remove clearly stale lock files + find ${LCK_FILE} -mmin +180 -exec rm -f {} \; + exit 0 + fi + else + logmsg "Obtained lock for ${THIS_PID} in ${LCK_FILE}" + echo "${THIS_PID}" > "${LCK_FILE}" + fi +} + +# stroom_rm_lock() +# Args: +# none +# +# Remove lock file + +stroom_rm_lock() { + if [ -f ${LCK_FILE} ]; then + logmsg "Removed lock ${LCK_FILE} for ${THIS_PID}" + rm -f ${LCK_FILE} + fi +} + +# gain_iana_timezone() +# Args: +# null +# +# Gain the host's Canonical timezone +# The algorithm in general is +# if /etc/timezone then +# This is a ubuntu scenario +# cat /etc/timezone +# elif /etc/localtime is a symbolic link and /usr/share/zoneinfo exists +# # This is a RHEL/BSD scenario. Get the filename in the database directory +# readlink /etc/localtime | sed -e 's@.*share/zoneinfo/\(posix\|right\|leaps\)/\|.*share/zoneinfo/@@' +# elif /etc/localtime is a file and /usr/share/zoneinfo exists +# # This is also a RHEL/BSD scenario. Get the filename in the database directory by brute force comparison +# find /usr/share/zoneinfo -type f ! -name 'posixrules' -exec cmp -s {} /etc/localtime \; -print | sed -e 's@.*share/zoneinfo/\(posix\|right\|leaps\)/\|.*share/zoneinfo/@@' | head -n1 +# elif /etc/TIMEZONE exists +# # This is for Solaris for completeness. Get the TZ value. May need to delete double quotes +# grep 'TZ=' /etc/TIMEZONE | cut -d= -f2- | sed -e 's/"//g' +# else +# nothing +# +gain_iana_timezone() +{ + if [ -f /etc/timezone ]; then + # Ubuntu based + cat /etc/timezone + elif [ -h /etc/localtime -a -d /usr/share/zoneinfo ]; then + # RHEL/BSD based + readlink /etc/localtime | sed -e 's@.*share/zoneinfo/\(posix\|right\|leaps\)/\|.*share/zoneinfo/@@' + elif [ -f /etc/localtime -a -d /usr/share/zoneinfo ]; then + # Older RHEL based + find /usr/share/zoneinfo -type f ! -name 'posixrules' -exec cmp -s {} /etc/localtime \; -print | sed -e 's@.*share/zoneinfo/\(posix\|right\|leaps\)/\|.*share/zoneinfo/@@' | head -n1 + fi +} + +# send_to_stroom() +# Args: +# $1 - the log file +# +# Send the given log file to the Stroom Web Service. + +send_to_stroom() { + logFile=$1 + logSz=`ls -sh ${logFile} | cut -d' ' -f1` + + # Create a string of local metadata for transmission. We start with the shar and filename. Note we can only + # have arguments added to the string if we can be assured they do not have embedded spaces + hostArgs="-H Shar256:`sha256sum -b ${logFile} | cut -d' ' -f1` -H LogFileName:`basename ${logFile}`" + myHost=`hostname --all-fqdns 2> /dev/null` + if [ $? -ne 0 ]; then + myHost=`hostname` + fi + myIPaddress=`hostname --all-ip-addresses 2> /dev/null` + + myDomain=`hostname -d 2>/dev/null` + if [ -n "${myDomain}" ]; then + myNameserver=`dig ${myDomain} SOA +time=3 +tries=2 +noall +answer +short 2>/dev/null | head -1 | cut -d' ' -f1` + if [ -n "$myNameserver" ]; then + hostArgs="${hostArgs} -H MyNameServer:\"${myNameserver}\"" + else + # Let's try dumb and see if there is a name server in /etc/resolv.conf and choose the first one + h=`egrep '^nameserver ' /etc/resolv.conf | head -1 | cut -f2 -d' '` + if [ -n "${h}" ]; then + h0=`host $h | gawk '{print $NF }'` + if [ -n "${h0}" ]; then + hostArgs="${hostArgs} -H MyNameServer:\"${h0}\"" + elif [ -n "${h}" ]; then + hostArgs="${hostArgs} -H MyNameServer:\"${h}\"" + fi + fi + fi + fi + # Gather various configuration details via facter(1) command if available + if hash facter 2>/dev/null; then + # Redirect facter's stderr as we may not be root + myMeta=`facter 2>/dev/null | awk '{ +if ($1 == "fqdn") printf "FQDN:%s\\\n", $3; +if ($1 == "uuid") printf "UUID:%s\\\n", $3; +if ($1 ~ /^ipaddress/) printf "%s:%s\\\n", $1, $3; +}'` + if [ -n "${myMeta}" ]; then + hostArgs="${hostArgs} -H MyMeta:\"${myMeta}\"" + fi + fi + # Local time zone + ltz=`date +%z` + if [ -n "${ltz}" ]; then + hostArgs="${hostArgs} -H MyTZ:${ltz}" + fi + + # Local Canonical Timezone + ctz=`gain_iana_timezone` + if [ -n "${ltz}" ]; then + hostArgs="${hostArgs} -H MyCanonicalTZ:${ctz}" + fi + + # Do the transfer. + # We loop through the urls array. We use the index urls_idx to iterate over the array + # this way, if we have a few failures to post, then the index will be at the successful + # url if we have multiple files to post + u=${urls[$urls_idx]} + _i=0 + while [ $_i -lt $urls_mod ]; do + + # For two-way SSL authentication replace '-k' below with '--cert /path/to/server.pem --cacert /path/to/root_ca.crt' on the curl cmds below + + # If not two-way SSL authentication, use the -k option to curl + if [ -n "${mySecZone}" -a "${mySecZone}" != "none" ]; then + RESPONSE_HTTP=`curl -k --connect-timeout ${C_TMO} --max-time ${M_TMO} --data-binary @${logFile} ${u} \ +-H "Feed:${FEED_NAME}" -H "System:${SYSTEM}" -H "Environment:${ENVIRONMENT}" -H "Version:${VERSION}" \ +-H "MyHost:\"${myHost%"${myHost##*[![:space:]]}"}\"" \ +-H "MyIPaddress:\"${myIPaddress%"${myIPaddress##*[![:space:]]}"}\"" \ +-H "MySecurityDomain:\"${mySecZone%"${mySecZone##*[![:space:]]}"}\"" \ +${hostArgs} \ +-H "Compression:GZIP" --write-out "RESPONSE_CODE=%{http_code}" 2>&1` + else + RESPONSE_HTTP=`curl -k --connect-timeout ${C_TMO} --max-time ${M_TMO} --data-binary @${logFile} ${u} \ +-H "Feed:${FEED_NAME}" -H "System:${SYSTEM}" -H "Environment:${ENVIRONMENT}" -H "Version:${VERSION}" \ +-H "MyHost:\"${myHost%"${myHost##*[![:space:]]}"}\"" \ +-H "MyIPaddress:\"${myIPaddress%"${myIPaddress##*[![:space:]]}"}\"" \ +${hostArgs} \ +-H "Compression:GZIP" --write-out "RESPONSE_CODE=%{http_code}" 2>&1` + fi + + # We first look for a positive response (ie 200) + RESPONSE_CODE=`echo ${RESPONSE_HTTP} | sed -e 's/.*RESPONSE_CODE=\(200\).*/\1/'` + if [ "${RESPONSE_CODE}" = "200" ] ;then + logmsg "Send status: [${RESPONSE_CODE}] SUCCESS Audit Log: ${logFile} Size: ${logSz} ProcessTime: ${ProcessTime} Feed: ${FEED_NAME}" + rm -f ${logFile} + return 0 + fi + + # If we can't find it in the output, look for the last response code + # We do this in the unlikely event that a corrupted arguement is passed to curl + RESPONSE_CODE=`echo ${RESPONSE_HTTP} | sed -e 's/.*RESPONSE_CODE=\([0-9]\+\)$/\1/'` + if [ "${RESPONSE_CODE}" = "200" ] ;then + logmsg "Send status: [${RESPONSE_CODE}] SUCCESS Audit Log: ${logFile} Size: ${logSz} ProcessTime: ${ProcessTime}" + rm -f ${logFile} + return 0 + fi + + # Fall through ... + + # We failed to tranfer the processed log file, so emit a message to that effect + msg="Send status: [${RESPONSE_CODE}] FAILED Audit Log: ${logFile} Reason: curl returned http_code (${RESPONSE_CODE})" + logmsg "$msg" + + # Work out the next url to use + ((_i++)) + urls_idx=$((++urls_idx % urls_mod)) + u=${urls[$urls_idx]} + done + + # We also send an event into the security syslog destination + logger -p "authpriv.info" -t $Arg0 "$msg" + + return 9 +} + +# MAIN: + +# Set up a delay of between 7 - $MAX_SLEEP seconds +# The additional 7 seconds is to allow for log acqusition time + +RANDOM=`echo ${RANDOM}` +MOD=`expr ${MAX_SLEEP} - 7` +SLEEP=`expr \( ${RANDOM} % ${MOD} \) + 7` + +# Get a lock +stroom_get_lock + +# Create queue directory if need be +if [ ! -d ${STROOM_LOG_QUEUED} ]; then mkdir -p ${STROOM_LOG_QUEUED}; fi + +# We may need to sleep +if [ ${NoSleep} -eq 0 ]; then + logmsg "Will sleep for ${SLEEP}s to help balance network traffic" + sleep ${SLEEP} +fi + +# We now collect the logs from the source and move them +# into our queueing directory + +# Squid logs that have rolled over are of the form +# squid.log.N +# in our log source directory. So we need to rotate the logs +# then move all logs to the queue directory with a unique tag. As we +# move the logs we also want to concatentate the logs onto +# /var/log/squid/access.log + +uTag=`date +%s` +cd ${STROOM_LOG_SOURCE} +/usr/sbin/squid -k rotate +# Sleep a bit for the rotate to occur +sleep 2 +l=`ls access.log.* 2>/dev/null | sort --key=3 --field-separator=\. --reverse --numeric-sort` +if [ ! -z "${l}" ]; then + for f in ${l}; do + if [ -s $f ]; then + ./squidplusXML.pl < ${f} > ${STROOM_LOG_QUEUED}/${uTag}.${f} + # Concatenate onto primarylog file + cat ${f} >> ${PrimaryLog} && rm -f ${f} + else + # Remove empty files + rm -f ${f} + fi + done +fi + +# Go to the queue directory and post +cd ${STROOM_LOG_QUEUED} +# Gzip any non-gziped files +l=`find . -type f -regextype sed ! -regex "./[0-9]\+.*.gz$"` +if [ ! -z "$l" ]; then + echo $l | xargs gzip --force +fi + +for f in `find . -type f -regextype sed -regex "./[0-9]\+.*.gz$"`; do + + if [ -s ${f} ]; then + export ProcessTime=0 + send_to_stroom ${f} + else + rm -f ${f} + fi +done + +clean_store ${STROOM_LOG_QUEUED} ${FAILED_RETENTION} ${FAILED_MAX} +stroom_rm_lock +exit 0 diff --git a/source/proxy/squidplus-proxy/clientArtefacts/squidplusXML.pl b/source/proxy/squidplus-proxy/clientArtefacts/squidplusXML.pl new file mode 100755 index 0000000..e522d02 --- /dev/null +++ b/source/proxy/squidplus-proxy/clientArtefacts/squidplusXML.pl @@ -0,0 +1,229 @@ +#!/usr/bin/perl +# + +# Install yum install 'perl(XML::Simple)' + +# Release 1.3 20200605 burn@swtf.dyndns.org +# - Modify script to transform URL unwise characters ("{" | "}" | "|" | "\" | "^" | "[" | "]" | "`") +# into their % equivalent +# +# Perl script to take a specific output of a squid proxy server and enrich +# supplied IP addresses with resolved fully qualified domain names if possible and convert +# the data to a simple XML form. +# +# For the purpose of speed, we don't form a correct XML tree but generate individual nodes +# and print them as they are formed (then free them). +# We wrap the output using a simple root element. +# +# It is required that the squid proxy format is as per +# +# logformat squidplus %ts.%03tu %tr %>a/%>p %la/%>lp %Ss/%>Hs/%st/%>sh %mt %rm "%ru" "%un" %Sh "%>h" "%a/%>p Client source IP address '/' Client source port +# %la/%>lp Local IP address the client connected to '/' Local port number the client connected to +# +# %Ss/%>Hs/%st/%>sh Total size of request received from client. '/' Size of request headers received from client +# %mt MIME content type +# %rm Request method (GET/POST etc) +# "%ru" '"' Request URL from client (historic, filtered for logging) '"' +# "%un" '"' User name (any available) '"' +# %Sh Squid hierarchy status (DEFAULT_PARENT etc) +# "%>h" '"' Original received request header. '"' +# "%\n"; +printf "\n"; + +while (<>) { + + chomp; + my $line = $_; + + # Sometimes the last two header variables are just too large and squid + # just prints what's in it's buffers. To get around this we capture + # everything up to the hierarchy in a well defined regex but then + # capture the last two variables together then attempt to split them + if (($dtg, $responseTimeMilliSeconds, + $clientIP, $clientPnum, + $serverIP, $serverPnum, + $lclientIP, $lclientPnum, + $lserverIP, $lserverPnum, + $requestStatus, $StatusToClient, $StatusNextHop, + $szAllToClient, $szHdrsToClient, + $szAllFromClient, $szHdrsFromClient, + $mimeContent, + $requestMethod, $requestURL, + $user, + $hierarchy, + $x0 + ) = ($line =~ m/ + ^(\d+\.\d+)\s + (\d+)\s + ([^\/]+)\/([^\s]+)\s + ([^\/]+)\/([^\s]+)\s + ([^\/]+)\/([^\s]+)\s + ([^\/]+)\/([^\s]+)\s + ([^\/]+)\/([^\/]+)\/([^\s]+)\s + (\d+)\/(\d+)\s + (\d+)\/(\d+)\s + (\S+)\s + ([^\s]+)\s + "([^"]+)"\s + "([^"]+)"\s + (\S+)\s + (.*) + $ + /x + )) + { + + # Deal with non printing chars (control) in likely input locations + # Yes. Squid has a bug on requestMethod (%rm) on 400 Bad Request's + $requestMethod =~ s/([[:cntrl:]])/'&#' . ord($1) . ';'/gse; + $x0 =~ s/([[:cntrl:]])/'&#' . ord($1) . ';'/gse; + $requestURL =~ s/([[:cntrl:]])/'&#' . ord($1) . ';'/gse; + # Correct for any unwise characters + # unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`" + $requestURL =~ s/{/'%7B'/gse; + $requestURL =~ s/}/'%7D'/gse; + $requestURL =~ s/\|/'%7C'/gse; + $requestURL =~ s/\\/'%5C'/gse; + $requestURL =~ s/\^/'%5E'/gse; + $requestURL =~ s/\[/'%5B'/gse; + $requestURL =~ s/\]/'%5D'/gse; + $requestURL =~ s/\`/'%60'/gse; + + # Now deal with the possible scenarios of + # "" "" + # "" " + # "" " + # " + $replyHdr = ""; + if ($x0 =~ m/"([^"\\]*(\\.[^"\\]*)*)"\s"([^"\\]*(\\.[^"\\]*)*)"$/) { + $receivedHdr = $1; + $replyHdr = $3; + } elsif ($x0 =~ m/"([^"\\]*(\\.[^"\\]*)*)"\s"(.*)$/) { + $receivedHdr = $1; + $replyHdr = $3; + } elsif ($x0 =~ m/"([^"\\]*(\\.[^"\\]*)*)"$/) { + $receivedHdr = $1; + } else { + $receivedHdr = $x0; + } + my $event = { + Evt => [ + { + dtg => [ $dtg ], + rTime => [ $responseTimeMilliSeconds ], + cIP => [ $clientIP ], + cHost => [ nslookup($clientIP) ], + cPort => [ $clientPnum ], + sIP => [ $serverIP ], + sHost => [ nslookup($serverIP) ], + sPort => [ $serverPnum ], + lcIP => [ $lclientIP ], + lcHost => [ nslookup($lclientIP) ], + lcPort => [ $lclientPnum ], + lsIP => [ $lserverIP ], + lsHost => [ nslookup($lserverIP) ], + lsPort => [ $lserverPnum ], + rStatus => [ $requestStatus ], + tCliStatus => [ $StatusToClient ], + nHopStatus => [ $StatusNextHop ], + SzAllTo => [ $szAllToClient ], + SzHdrsTo => [ $szHdrsToClient ], + SzAllFrom => [ $szAllFromClient ], + SzHdrsFrom => [ $szHdrsFromClient ], + mime => [ $mimeContent ], + rMethod => [ $requestMethod ], + rURL => [ $requestURL ], + user => [ $user ], + hierarch => [ $hierarchy ], + recHdr => [ $receivedHdr ], + rplHdr => [ $replyHdr ], + } + ] + }; + print XMLout($event, RootName => undef, NumericEscape => 2); + undef $event; + } +} +printf "\n"; diff --git a/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Feed.ae1af17e-24a4-415e-8502-a11c39df1904.meta b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Feed.ae1af17e-24a4-415e-8502-a11c39df1904.meta new file mode 100644 index 0000000..b9ea4b6 --- /dev/null +++ b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Feed.ae1af17e-24a4-415e-8502-a11c39df1904.meta @@ -0,0 +1,13 @@ +{ + "type" : "Feed", + "uuid" : "ae1af17e-24a4-415e-8502-a11c39df1904", + "name" : "Squid-Plus-XML-V1.0-EVENTS", + "version" : "bc32fa00-8dac-46b1-87ac-867941ccc65d", + "description" : "Squid Proxy logs formed using the Squid-Plus XML proxy agent", + "classification" : "", + "encoding" : "UTF-8", + "contextEncoding" : "UTF-8", + "reference" : false, + "streamType" : "Raw Events", + "status" : "RECEIVE" +} diff --git a/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Feed.ae1af17e-24a4-415e-8502-a11c39df1904.node b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Feed.ae1af17e-24a4-415e-8502-a11c39df1904.node new file mode 100644 index 0000000..443ce06 --- /dev/null +++ b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Feed.ae1af17e-24a4-415e-8502-a11c39df1904.node @@ -0,0 +1,4 @@ +name=Squid-Plus-XML-V1.0-EVENTS +path=Event Sources/Proxy/Squid-Plus-XML +type=Feed +uuid=ae1af17e-24a4-415e-8502-a11c39df1904 diff --git a/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Pipeline.c8b3c862-9249-40e6-912d-8dcf9552e153.meta b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Pipeline.c8b3c862-9249-40e6-912d-8dcf9552e153.meta new file mode 100644 index 0000000..879264a --- /dev/null +++ b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Pipeline.c8b3c862-9249-40e6-912d-8dcf9552e153.meta @@ -0,0 +1,12 @@ +{ + "type" : "Pipeline", + "uuid" : "c8b3c862-9249-40e6-912d-8dcf9552e153", + "name" : "Squid-Plus-XML-V1.0-EVENTS", + "version" : "08ac0b80-2a1f-485b-9b1e-67070f8558a6", + "description" : "Squid-Plus-XML pipeline - translate to events only\nDepends on Template Pipeline Event Data (XML)", + "parentPipeline" : { + "type" : "Pipeline", + "uuid" : "b07a43a9-f970-4642-9452-42a38fbd447e", + "name" : "Event Data (XML)" + } +} diff --git a/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Pipeline.c8b3c862-9249-40e6-912d-8dcf9552e153.node b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Pipeline.c8b3c862-9249-40e6-912d-8dcf9552e153.node new file mode 100644 index 0000000..2a49acc --- /dev/null +++ b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Pipeline.c8b3c862-9249-40e6-912d-8dcf9552e153.node @@ -0,0 +1,4 @@ +name=Squid-Plus-XML-V1.0-EVENTS +path=Event Sources/Proxy/Squid-Plus-XML +type=Pipeline +uuid=c8b3c862-9249-40e6-912d-8dcf9552e153 diff --git a/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Pipeline.c8b3c862-9249-40e6-912d-8dcf9552e153.xml b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Pipeline.c8b3c862-9249-40e6-912d-8dcf9552e153.xml new file mode 100644 index 0000000..e7948aa --- /dev/null +++ b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.Pipeline.c8b3c862-9249-40e6-912d-8dcf9552e153.xml @@ -0,0 +1,18 @@ + + + + + + translationFilter + xslt + + + XSLT + 12a5fb79-fe4e-4a96-9d7e-277b46db19b3 + Squid-Plus-XML-V1.0-EVENTS + + + + + + diff --git a/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.XSLT.12a5fb79-fe4e-4a96-9d7e-277b46db19b3.meta b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.XSLT.12a5fb79-fe4e-4a96-9d7e-277b46db19b3.meta new file mode 100644 index 0000000..e03770a --- /dev/null +++ b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.XSLT.12a5fb79-fe4e-4a96-9d7e-277b46db19b3.meta @@ -0,0 +1,7 @@ +{ + "type" : "XSLT", + "uuid" : "12a5fb79-fe4e-4a96-9d7e-277b46db19b3", + "name" : "Squid-Plus-XML-V1.0-EVENTS", + "version" : "218c4927-fed0-4059-8d37-b789572486ca", + "description" : "Translation for SquidPlus XML format events." +} diff --git a/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.XSLT.12a5fb79-fe4e-4a96-9d7e-277b46db19b3.node b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.XSLT.12a5fb79-fe4e-4a96-9d7e-277b46db19b3.node new file mode 100644 index 0000000..b9cf08c --- /dev/null +++ b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.XSLT.12a5fb79-fe4e-4a96-9d7e-277b46db19b3.node @@ -0,0 +1,4 @@ +name=Squid-Plus-XML-V1.0-EVENTS +path=Event Sources/Proxy/Squid-Plus-XML +type=XSLT +uuid=12a5fb79-fe4e-4a96-9d7e-277b46db19b3 diff --git a/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.XSLT.12a5fb79-fe4e-4a96-9d7e-277b46db19b3.xsl b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.XSLT.12a5fb79-fe4e-4a96-9d7e-277b46db19b3.xsl new file mode 100644 index 0000000..bf14c03 --- /dev/null +++ b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS.XSLT.12a5fb79-fe4e-4a96-9d7e-277b46db19b3.xsl @@ -0,0 +1,586 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Squid + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -Query + + + + Receipt of information from a Resource via Proxy + + + + + + + + + + Transmission of information to a Resource via Proxy + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + false + + + + + + + + + + + + false + + + + + + + + + + + + + + + + + Continue + Switching Protocols + Processing + + + OK + Created + Accepted + Non-Authoritative Information + No Content + Reset Content + Partial Content + Multi Status + + + Multiple Choices + Moved Permanently + Moved Temporarily + See Other + Not Modified + Use Proxy + Temporary Redirect + + + Bad Request + Unauthorized + Payment Required + Forbidden + Not Found + Method Not Allowed + Not Acceptable + Proxy Authentication Required + Request Timeout + Conflict + Gone + Length Required + Precondition Failed + Request Entity Too Large + Request URI Too Large + Unsupported Media Type + Request Range Not Satisfiable + Expectation Failed + Unprocessable Entity + Locked/Failed Dependency + Unprocessable Entity + + + Internal Server Error + Not Implemented + Bad Gateway + Service Unavailable + Gateway Timeout + HTTP Version Not Supported + Insufficient Storage + Squid: header parsing error + Squid: header size overflow detected while parsing/roundcube: software configuration error + roundcube: invalid authorization + + + + + + diff --git a/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS_Pipeline_Filter_1880903.ProcessorFilter.18809031-7c97-44db-9c9f-d53a825380be.meta b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS_Pipeline_Filter_1880903.ProcessorFilter.18809031-7c97-44db-9c9f-d53a825380be.meta new file mode 100644 index 0000000..0d6ffa7 --- /dev/null +++ b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS_Pipeline_Filter_1880903.ProcessorFilter.18809031-7c97-44db-9c9f-d53a825380be.meta @@ -0,0 +1,36 @@ +{ + "uuid" : "18809031-7c97-44db-9c9f-d53a825380be", + "queryData" : { + "dataSource" : { + "type" : "StreamStore", + "uuid" : "0", + "name" : "StreamStore" + }, + "expression" : { + "type" : "operator", + "children" : [ { + "type" : "term", + "field" : "Feed", + "condition" : "IS_DOC_REF", + "value" : "Squid-Plus-XML-V1.0-EVENTS", + "docRef" : { + "type" : "Feed", + "uuid" : "ae1af17e-24a4-415e-8502-a11c39df1904", + "name" : "Squid-Plus-XML-V1.0-EVENTS" + } + }, { + "type" : "term", + "field" : "Type", + "condition" : "EQUALS", + "value" : "Raw Events" + } ] + } + }, + "priority" : 10, + "reprocess" : false, + "enabled" : true, + "deleted" : false, + "processorUuid" : "8524e610-b2e4-4ae0-9c44-e30185d6a3ba", + "pipelineUuid" : "c8b3c862-9249-40e6-912d-8dcf9552e153", + "pipelineName" : "Squid-Plus-XML-V1.0-EVENTS" +} diff --git a/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS_Pipeline_Filter_1880903.ProcessorFilter.18809031-7c97-44db-9c9f-d53a825380be.node b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS_Pipeline_Filter_1880903.ProcessorFilter.18809031-7c97-44db-9c9f-d53a825380be.node new file mode 100644 index 0000000..b184037 --- /dev/null +++ b/source/proxy/squidplus-proxy/stroomContent/Event_Sources/Proxy/Squid_Plus_XML/Squid_Plus_XML_V1_0_EVENTS_Pipeline_Filter_1880903.ProcessorFilter.18809031-7c97-44db-9c9f-d53a825380be.node @@ -0,0 +1,4 @@ +name=Squid-Plus-XML-V1.0-EVENTS Pipeline-Filter 1880903 +path=Event Sources/Proxy/Squid-Plus-XML +type=ProcessorFilter +uuid=18809031-7c97-44db-9c9f-d53a825380be