Post-processing with Tawk
Contents
- Prerequisites
- General introduction
- Examples
- Print all 5 tuples (source and destination IP and ports, protocol)
- Print the hosts involved in the most flows
- Ignore all flows between private IPs
- Print the source and destination addresses of all DNS flows related to Facebook
- Replace the protocol number by its string representation, e.g., 6 -> TCP
- Replace the Unix timestamp used for timeFirst and timeLast by their value in UTC
- Replace the Unix timestamp used for timeFirst and timeLast by their values in localtime
- Print the 10 hosts sending the most bytes over UDP
- Inspect the flow number 1234 in the flow file
- Follow a specific flow, e.g., the flow with flow index 1234, in the packet file
- Inspect the packet number 1234 in the packet file
- Follow a flow (similar to Wireshark follow TCP/UDP stream):
- Recreate a binary file transferred in a B flow:
- Extract all flows whose HTTP Host: header matches google using Wireshark field names
- Extract the DNS query field from all flows where at least one DNS answer was seen (using Wireshark field names)
- Open all ICMP flows involving the network 1.2.3.4/24 in Wireshark
- Create a PCAP files with all TCP flows with port 80 or 8080
- Writing a Tawk function
- Using Tawk within scripts
- Using Tawk with non-Tranalyzer files
- Mapping external column names to Tranalyzer column names
- Examples
- See also
This tutorial presents tawk
functionality through various scenarios.
tawk
works just like awk
, but provides access to the columns via their names.
In addition, it provides access to helper functions, such as host()
or port()
.
For an overview, refer to the Alphabetical list of Tawk functions.
Custom functions can be added in the folder named t2custom where they will be automatically loaded.
Prerequisites
This tutorial assumes a working knowledge of awk
.
Dependencies
gawk version 4.1 is required.
Kali/Ubuntu | sudo apt-get install gawk | |
Arch | sudo pacman -S gawk | |
Fedora/Red Hat | sudo yum install gawk | |
Gentoo | sudo emerge gawk | |
openSUSE | sudo zypper install gawk | |
macOS | brew install gawk | (Homebrew package manager) |
Installation
The recommended way to install tawk
is to install t2_aliases as documented in README.md:
Append the following line to
~/.bashrc
:if [ -f "$T2HOME/scripts/t2_aliases" ]; then . $T2HOME/scripts/t2_aliases # Note the leading `.' fi
Make sure to replace
$T2HOME
with the actual path, e.g.,$HOME/tranalyzer-0.9.0/plugins
.
Documentation (man pages)
The man pages for tawk and t2nfdump (more on that later) can be installed by running: ./install.sh man
.
Once installed, they can be consulted by running man tawk
and man t2nfdump
respectively.
General introduction
Command line options
First, run tawk -h
to list the available command line options:
tawk -h
Usage:
tawk [OPTION...] 'program' file_flows.txt
tawk [OPTION...] -I file_flows.txt 'program'
Input arguments:
-I file Alternative way to specify the input file
Optional arguments:
-N num Row number where column names are to be found
-s char First character for the row listing the columns name
-F fs Use 'fs' as input field separator
-O fs Use 'fs' as output field separator
--csv Set input and output separators to ',' and
extract names from first row
--zeek Configure tawk to work with Bro/Zeek log files
-f file Read (t)awk program from file
-n Load nfdump functions
-e Load examples functions
-H Do not output the header (column names)
-c[=u] Output command line as a comment
(use -c=u for UTC instead of localtime)
-t Validate column names (slow)
-r Try renaming invalid columns (suffix them with '_') (slow)
Tranalyzer specific arguments:
-k Run Wireshark on the extracted data
-x outfile Create a PCAP file with the selected flows/packets
-X xerfile Specify the '.xer' file to use with -k and -x options
-P Extract specific packets instead of whole flows
-b Always extract both directions (A and B flows)
-V vname[=value] Display Tranalyzer variable 'vname' documentation
-L Decode all variables from Tranalyzer log file
Help and documentation arguments:
-l[=n], --list[=n] List column names and numbers
-g[=n], --func[=n] List available functions
-d fname Display function 'fname' documentation
-D Display tawk PDF documentation
-?, -h, --help Show help options and exit
-s and -N options
The -s
option can be used to specify the starting character(s) of the row containing the column names (default: %
).
If several rows start with the specified character(s), then the last one is used as column names.
To change this behavior, the line number can be specified as well with the help of the -N
option.
For example, if rows 1 to 5 start with #
and row 3 contains the column names, specify the separator as follows: tawk -s "#" -N 3
.
If the row with column names does not start with a special character, use -s ""
}.
What features (columns) are available?
tawk -l FILE_flows.txt
What functions are available?
tawk -g FILE_flows.txt
Alternatively, refer to the Alphabetical list of Tawk functions.
How to use a specific function?
tawk -d function_name
How to interpret a specific column?
tawk -V colName
tawk -V colName=value
How to decode all aggregated fields in Tranalyzer log file?
tawk -L out_log.txt
t2 -r file.pcap | tawk -L
Examples
Print all 5 tuples (source and destination IP and ports, protocol)
tawk '{ print tuple5() }' FILE_flows.txt
Print the hosts involved in the most flows
tawk '{ aggr($srcIP); aggr($dstIP) }' FILE_flows.txt
Ignore all flows between private IPs
tawk 'not(privip($srcIP) && privip($dstIP))' FILE_flows.txt
Print the source and destination addresses of all DNS flows related to Facebook
tawk 'wildcard("^dns.*") ~ /facebook/ { print tuple2() }' FILE_flows.txt
Replace the protocol number by its string representation, e.g., 6 -> TCP
tawk '{ $l4Proto = proto2str($l4Proto); print }' FILE_flows.txt
Replace the Unix timestamp used for timeFirst and timeLast by their value in UTC
tawk '{ $timeFirst = utc($timeFirst); $timeLast = utc($timeLast); print }' FILE_flows.txt
Replace the Unix timestamp used for timeFirst and timeLast by their values in localtime
tawk '{ $timeFirst = localtime($timeFirst); $timeLast = localtime($timeLast); print }' FILE_flows.txt
Print the 10 hosts sending the most bytes over UDP
tawk -H ' udp() && !bitsallset($flowStat, 1) { aggr($srcIP, $numBytesSnt, 10); aggr($dstIP, $numBytesRcvd, 10); } ' FILE_flows.txt
Inspect the flow number 1234 in the flow file
tawk 'flow(1234)' FILE_flows.txt
Follow a specific flow, e.g., the flow with flow index 1234, in the packet file
tawk 'flow(1234)' FILE_packets.txt
Inspect the packet number 1234 in the packet file
tawk 'packet(1234)' FILE_packets.txt
Follow a flow (similar to Wireshark follow TCP/UDP stream):
tawk 'follow_stream(1)' FILE_packets.txt
Recreate a binary file transferred in a B flow:
tawk 'follow_stream(1, 3, "B")' FILE_packets.txt | xxd -p -r > out.data
Extract all flows whose HTTP Host: header matches google using Wireshark field names
tawk 'shark("http.host") ~ /google/' FILE_flows.txt
Extract the DNS query field from all flows where at least one DNS answer was seen (using Wireshark field names)
tawk 'shark("dns.count.answers") { print shark("dns.qry.name") }' FILE_flows.txt
Open all ICMP flows involving the network 1.2.3.4/24 in Wireshark
tawk -k 'icmp() && host("1.2.3.4/24")' FILE_flows.txt
Create a PCAP files with all TCP flows with port 80 or 8080
tawk -x file.pcap 'tcp() && port("80;8080")' FILE_flows.txt
Writing a Tawk function
- Ideally one function per file (where the filename is the name of the function)
- Private functions are prefixed with an underscore
- Always declare local variables 8 spaces after the function arguments
- Local variables are prefixed with an underscore
- Use uppercase letters and two leading and two trailing underscores for global variables
- Include all referenced functions
- Files should be structured as follows:
#!/usr/bin/env awk
#
# Function description
#
# Parameters:
# - arg1: description
# - arg2: description (optional)
#
# Dependencies:
# - plugin1
# - plugin2 (optional)
#
# Examples:
# - tawk `funcname()' file.txt
# - tawk `{ print funcname() }' file.txt
@include "hdr"
@include "_validate_col"
function funcname(arg1, arg2, [8 spaces] _locvar1, _locvar2) {
= _validate_col("colname1;altcolname1", _my_colname1)
_locvar1 ("colname2")
_validate_col
if (hdr()) {
if (__PRIHDR__) print "header"
} else {
print "something", _locvar1, $colname2
}
}
- Copy your files in the t2custom folder.
- To have your functions automatically loaded, include them in the file
t2custom/t2custom.load
.
Using Tawk within scripts
To use tawk
from within a script:
- Create a
TAWK
variable pointing to the script:TAWK="$T2HOME/scripts/tawk/tawk"
(make sure to replace$T2HOME
with the actual path to thescripts
folder) - Call
tawk
as follows:$TAWK 'dport(80)' file.txt
Using Tawk with non-Tranalyzer files
tawk
can also be used with files which were not produced by Tranalyzer.
- The input field separator can be specified with the
-F
option, e.g.,tawk -F ',' 'program' file.csv
- The row listing the column names, can start with any character specified with the
-s
option, e.g.,tawk -s '#' 'program' file.txt
- All the column names must not be equal to a function or builtin name (
tawk
will try renaming them with a trailing underscore if-r
option is being used (slow)) - Valid column names must start with a letter (
a-z
,A-Z
) and can be followed by any number of alphanumeric characters or underscores - If the column names are different from those used by Tranalyzer, refer to the next section.
Mapping external column names to Tranalyzer column names
If the column names are different from those used by Tranalyzer, a mapping between the different names can be made in the file scripts/tawk/my_vars
.
The format of the file is as follows:
BEGIN {
= non_t2_name_for_srcIP
_my_srcIP = non_t2_name_for_dstIP
_my_dstIP
...}
Once edited, run tawk
with the -i $T2HOME/scripts/tawk/my_vars
option and the external column names will be automatically used by tawk functions, such as tuple2()
.
For more details, refer to the my_vars
file itself.
Using Tawk with Bro/Zeek files
To use tawk
with Bro/Zeek log files, use one of --bro
or --zeek
option:
tawk --bro '{ program }' file.log
tawk --zeek '{ program }' file.log
Examples
Pivoting (variant 1):
First, extract an attribute of interest, e.g., an unresolved IP address in the Host: field of the HTTP header:
tawk 'aggr($httpHosts)' FILE_flows.txt | tawk '{ print unquote($1); exit }'
Then, put the result of the last command in the
badguy
variable and use it to extract flows involving this IP:tawk -v badguy="$(!!)" 'host(badguy)' FILE_flows.txt
Pivoting (variant 2):
First, extract an attribute of interest, e.g., an unresolved IP address in the Host: field of the HTTP header, and store it into a
badip
variable:badip="$(tawk 'aggr($httpHosts)' FILE_flows.txt | tawk '{ print unquote($1); exit }')"
Then, use the
badip
variable to extract flows involving this IP:tawk -v badguy="$badip" 'host(badguy)' FILE_flows.txt
Aggregate the number of bytes sent between source and destination addresses (independent of the protocol and port) and output the top 10 results:
tawk 'aggr($srcIP4 OFS $dstIP4, $numBytesSnt, 10)' FILE_flows.txt
Aggregate the number of bytes, packets and flows sent over TCP between source and destination addresses (independent of the port) and output the top 20 results (output sorted accorded to
numBytesSnt
):tawk 'tcp() { aggr(tuple2(), $numBytesSnt OFS $numPktsSnt OFS "Flows", 20) }' FILE_flows.txt
Sort the flow file according to the duration (longest flows first) and output the top 5 results:
tawk 't2sort(duration, 5)' FILE_flows.txt
Extract all TCP flows:
tawk 'tcp()' FILE_flows.txt
Extract all flows whose destination port is between 6000 and 6008 (included):
tawk 'dport("6000-6008")' FILE_flows.txt
Extract all flows whose destination port is 53, 80 or 8080:
tawk 'dport("53;80;8080")' FILE_flows.txt
Extract all flows involving an IP in the subnet
192.168.1.0/24
(using thehost()
ornet()
function):tawk 'host("192.168.1.0/24")' FILE_flows.txt
tawk 'net("192.168.1.0/24")' FILE_flows.txt
Extract all flows whose destination IP is in subnet
192.168.1.0/24
(using thedhost()
ordnet()
function):tawk 'dhost("192.168.1.0/24")' FILE_flows.txt
tawk 'dnet("192.168.1.0/24")' FILE_flows.txt
Extract all flows whose source IP is in subnet
192.168.1.0/24
(using theshost()
orsnet()
function):tawk 'shost("192.168.1.0/24")' FILE_flows.txt
tawk 'snet("192.168.1.0/24")' FILE_flows.txt
Extract all flows whose source IP is in subnet 192.168.1.0/24 (using the
ipinrange()
function):tawk 'ipinrange($srcIP4, "192.168.1.0", "192.168.1.255")' FILE_flows.txt
Extract all flows whose source IP is in subnet 192.168.1.0/24 (using the
ipinnet()
function):tawk 'ipinnet($srcIP4, "192.168.1.0", "255.255.255.0")' FILE_flows.txt
Extract all flows whose source IP is in subnet 192.168.1.0/24 (using the
ipinnet()
function and a hex mask):tawk 'ipinnet($srcIP4, "192.168.1.0", 0xffffff00)' FILE_flows.txt
Extract all flows whose source IP is in subnet 192.168.1.0/24 (using the
ipinnet()
function and the CIDR notation):tawk 'ipinnet($srcIP4, "192.168.1.0/24")' FILE_flows.txt
Extract all flows whose source IP is in subnet 192.168.1.0/24 (using the
ipinnet()
function and a CIDR mask):tawk 'ipinnet($srcIP4, "192.168.1.0", 24)' FILE_flows.txt
For more examples, refer to tawk -d
option, e.g., tawk -d aggr
, where every function is documented and comes with a set of examples.
For more complex examples, have a look at the scripts/t2fm/tawk/
folder.
The complete documentation can be consulted by running tawk -d all
.