Flow name labeling

labeling

Introduction

The fnameLabel plugin helps you tag flows which originate from different files or interfaces. It is especially useful for the -R or -D options, discussed in the multifileIO tutorial.

Preparation

First, restore T2 into a pristine state by removing all unnecessary or older plugins from the plugin folder ~/.tranalyzer/plugins:

t2build -e -y

Are you sure you want to empty the plugin folder '/home/wurst/.tranalyzer/plugins' (y/N)? yes
Plugin folder emptied

Then compile the following plugins:

t2build tranalyzer2 basicFlow tcpStates fnameLabel txtSink

...
BUILD SUCCESSFUL

If you did not create a separate data and results directory yet, please do it now in another bash window, that facilitates your workflow:

mkdir ~/data ~/results

The anonymized sample PCAP used in this tutorial can be downloaded here: faf-exercise.pcap.

Please save it in your ~/data folder.

Now you are all set!

PCAP fragmentation

As we are using the multifileIO option, the pcap needs to be chunked. Just invoke the commands below:

mkdir ~/data/F

tcpdump -r ~/data/faf-exercise.pcap -w ~/data/F/faf-exercise.pcap -C 1

tcpdump -r ~/data/faf-exercise.pcap -w ~/data/F/faf-exercise.pcap -C 1

ls ~/data/F

faf-exercise.pcap  faf-exercise.pcap1  faf-exercise.pcap2  faf-exercise.pcap3  faf-exercise.pcap4  faf-exercise.pcap5

Now you are all set for the following chapter.

Plugin fnameLabel

The fnameLabel plugin not only tags flows but also adds a hash value or a label which represents the number contained in a file or a specific letter. It is predominantly used to automatically separate flows created by the -R or -D option for training of classifiers. In order to see the configuration, move to the fnameLabel plugin and look into the fnameLabel.h file.

fnameLabel

vi src/fnameLabel.h

...
/* ========================================================================== */
/* ------------------------ USER CONFIGURATION FLAGS ------------------------ */
/* ========================================================================== */

#define FNL_LBL        1 // 1: Output label derived from input
                         //    (Use fileNum for Tranalyzer -D option, otherwise refer to FNL_IDX)
#define FNL_IDX        0 // Use the 'FNL_IDX' letter of the filename as label
                         // (t2 -R/-i/-r options) [require FNL_LBL=1]
#define FNL_HASH       0 // 1: Output hash of filename
#define FNL_FLNM       1 // 1: Output filename
#define FNL_FREL       1 // Use absolute (0) or relative (1) filenames for fnLabel, fnHash and fnName

#define FNL_NAMELEN 1024 // Max length for filename

/* +++++++++++++++++++++ ENV / RUNTIME - conf Variables +++++++++++++++++++++ */

/*        No env / runtime configuration flags available for fnameLabel       */

/* ========================================================================== */
/* ------------------------- DO NOT EDIT BELOW HERE ------------------------- */
/* ========================================================================== */
...

If -D is utilized the label denotes the file number in the file name regex. In all other cases the constant FNL_IDX defines the position of the character in the filename to be taken as label. Note that if FNL_FREL=1, then the position refers to the relative filename (position 0 refers to the first character after the last slash). If FNL_FREL=0, the position refers to the absolute path. For example:

Filename FNL_IDX FNL_FREL=1 FNL_FREL=0
/home/user/data/F/faf-exercise.pcap 0 f /
/home/user/data/F/faf-exercise.pcap 1 a h

If you like you may switch on FNL_HASH as well, as it produces a unique number representing the filename. Here, we leave everything else as default.

fnameLabel using the -D option

t2 -D ~/data/F/faf-exercise.pcap1,5 -w ~/results/

================================================================================
Tranalyzer 0.8.14 (Anteater), Tarantula. PID: 56586
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Active plugins:
    01: basicFlow, 0.8.14
    02: tcpStates, 0.8.14
    03: fnameLabel, 0.8.14
    04: txtSink, 0.8.14
[INF] IPv4 Ver: 5, Rev: 16122020, Range Mode: 0, subnet ranges loaded: 406105 (406.11 K)
[INF] IPv6 Ver: 5, Rev: 17122020, Range Mode: 0, subnet ranges loaded: 51345 (51.34 K)
Processing file: /home/wurst/data/F/faf-exercise.pcap1
Link layer type: Ethernet [EN10MB/1]
Dump start: 1258594168.120912 sec (Thu 19 Nov 2009 01:29:28 GMT)
Processing file: /home/wurst/data/F/faf-exercise.pcap2
Processing file: /home/wurst/data/F/faf-exercise.pcap3
Processing file: /home/wurst/data/F/faf-exercise.pcap4
Processing file: /home/wurst/data/F/faf-exercise.pcap5
Dump stop : 1258594491.683288 sec (Thu 19 Nov 2009 01:34:51 GMT)
Total dump duration: 323.562376 sec (5m 23s)
...
Number of processed   flows: 4
Number of processed A flows: 2 [50.00%]
Number of processed B flows: 2 [50.00%]
Number of request     flows: 2 [50.00%]
Number of reply       flows: 2 [50.00%]
...

Only 4 flows? Why is that? If you run faf-exercise.pcap with t2 -r we have 72 flows. This is because most of the flows are generated in the first chunk F/faf-exercise.pcap. We started with index 1, remember?! Gotcha. If you wanted to process all the chunks, you could modify the -D option as follows: -D ~/data/F/faf-exercise.pcap,5

If you look now into the resulting flow file ~/results/faf-exercise_flows.txt you will see flows with fnLabel 1 and 5, which match the number in fname (the filename). This means each of those files caused a flow to be created.

tcol ~/results/faf-exercise_flows.txt

%dir  flowInd  flowStat            timeFirst          timeLast           duration    numHdrDesc  numHdrs  hdrDesc       srcMac             dstMac             ethType  ethVlanID  srcIP          srcIPCC  srcIPOrg           srcPort  dstIP          dstIPCC  dstIPOrg           dstPort  l4Proto  tcpStates  fnLabel  fname
A     1        0x0400000000004000  1258594168.120912  1258594185.427506  17.306594   1           3        eth:ipv4:tcp  00:19:e3:e7:5d:23  00:08:74:38:01:b4  0x0800              143.166.11.10  us       "Dell"             64334    192.168.1.105  07       "Private network"  49330    6        0x03       1        "faf-exercise.pcap1"
B     1        0x0400000000004001  1258594168.121080  1258594191.015208  22.894128   1           3        eth:ipv4:tcp  00:08:74:38:01:b4  00:19:e3:e7:5d:23  0x0800              192.168.1.105  07       "Private network"  49330    143.166.11.10  us       "Dell"             64334    6        0x43       1        "faf-exercise.pcap1"
A     2        0x0400000000004000  1258594185.618346  1258594185.618346  0.000000    1           3        eth:ipv4:tcp  00:08:74:38:01:b4  00:19:e3:e7:5d:23  0x0800              192.168.1.105  07       "Private network"  49329    143.166.11.10  us       "Dell"             21       6        0x03       5        "faf-exercise.pcap5"
B     2        0x0400000000004001  1258594185.427515  1258594491.683288  306.255773  1           3        eth:ipv4:tcp  00:19:e3:e7:5d:23  00:08:74:38:01:b4  0x0800              143.166.11.10  us       "Dell"             21       192.168.1.105  07       "Private network"  49329    6        0x43       5        "faf-exercise.pcap5"

Note that if you set FNL_FREL to 0, then the absolute path, e.g., /home/user/data/F/faf-exercise.pcap1, would be printed instead of the relative one, e.g., faf-exercise.pcap1.

fnameLabel using the -R option

In case of the -R option, we first have to create a pcap file list.

t2caplist ~/data/*[0-9] > ~/data/faf-exercise.txt

cat ~/data/faf-exercise.txt

/home/user/data/F/faf-exercise.pcap1
/home/user/data/F/faf-exercise.pcap2
/home/user/data/F/faf-exercise.pcap3
/home/user/data/F/faf-exercise.pcap4
/home/user/data/F/faf-exercise.pcap5

Then FNL_IDX needs to be set to the character position where the number is to be expected. And then invoke T2 on this very list.

t2conf fnameLabel -D FNL_IDX=17 && t2build fnameLabel

t2 -R ~/data/F/faf-exercise.txt -w ~/results/

================================================================================
Tranalyzer 0.8.14 (Anteater), Tarantula. PID: 27767
================================================================================
[INF] Creating flows for L2, IPv4, IPv6
Checking list file
    checking file '/home/user/data/F/faf-exercise.pcap1'
    checking file '/home/user/data/F/faf-exercise.pcap2'
    checking file '/home/user/data/F/faf-exercise.pcap3'
    checking file '/home/user/data/F/faf-exercise.pcap4'
    checking file '/home/user/data/F/faf-exercise.pcap5'
Active plugins:
    01: basicFlow, 0.8.14
    02: tcpStates, 0.8.14
    03: fnameLabel, 0.8.14
    04: txtSink, 0.8.14
[INF] IPv4 Ver: 5, Rev: 16122020, Range Mode: 0, subnet ranges loaded: 406105 (406.11 K)
[INF] IPv6 Ver: 5, Rev: 17122020, Range Mode: 0, subnet ranges loaded: 51345 (51.34 K)
Processing list file: /home/user/data/F/faf-exercise.txt
Processing file no. 1 of 5: /home/user/data/F/faf-exercise.pcap1
Link layer type: Ethernet [EN10MB/1]
Dump start: 1258594168.120912 sec (Thu 19 Nov 2009 01:29:28 GMT)
Processing file no. 2 of 5: /home/user/data/F/faf-exercise.pcap2
Link layer type: Ethernet [EN10MB/1]
Processing file no. 3 of 5: /home/user/data/F/faf-exercise.pcap3
Link layer type: Ethernet [EN10MB/1]
Processing file no. 4 of 5: /home/user/data/F/faf-exercise.pcap4
Link layer type: Ethernet [EN10MB/1]
Processing file no. 5 of 5: /home/user/data/F/faf-exercise.pcap5
Link layer type: Ethernet [EN10MB/1]
Dump stop : 1258594491.683288 sec (Thu 19 Nov 2009 01:34:51 GMT)
Total dump duration: 323.562376 sec (5m 23s)
Finished processing. Elapsed time: 0.002678 sec
Finished unloading flow memory. Time: 0.002699 sec
...
Number of processed   flows: 4
Number of processed A flows: 2 [50.00%]
Number of processed B flows: 2 [50.00%]
Number of request     flows: 2 [50.00%]
Number of reply       flows: 2 [50.00%]
...

And you see the same result as before with the -D option.

tcol ~/results/faf-exercise_flows.txt

%dir  flowInd  flowStat            timeFirst          timeLast           duration    numHdrDesc  numHdrs  hdrDesc       srcMac             dstMac             ethType  ethVlanID  srcIP          srcIPCC  srcIPOrg           srcPort  dstIP          dstIPCC  dstIPOrg           dstPort  l4Proto  tcpStates  fnLabel  fname
A     1        0x0400000000004000  1258594168.120912  1258594185.427506  17.306594   1           3        eth:ipv4:tcp  00:19:e3:e7:5d:23  00:08:74:38:01:b4  0x0800              143.166.11.10  us       "Dell"             64334    192.168.1.105  07       "Private network"  49330    6        0x03       1        "faf-exercise.pcap1"
B     1        0x0400000000004001  1258594168.121080  1258594191.015208  22.894128   1           3        eth:ipv4:tcp  00:08:74:38:01:b4  00:19:e3:e7:5d:23  0x0800              192.168.1.105  07       "Private network"  49330    143.166.11.10  us       "Dell"             64334    6        0x43       1        "faf-exercise.pcap1"
A     2        0x0400000000004000  1258594185.618346  1258594185.618346  0.000000    1           3        eth:ipv4:tcp  00:08:74:38:01:b4  00:19:e3:e7:5d:23  0x0800              192.168.1.105  07       "Private network"  49329    143.166.11.10  us       "Dell"             21       6        0x03       5        "faf-exercise.pcap5"
B     2        0x0400000000004001  1258594185.427515  1258594491.683288  306.255773  1           3        eth:ipv4:tcp  00:19:e3:e7:5d:23  00:08:74:38:01:b4  0x0800              143.166.11.10  us       "Dell"             21       192.168.1.105  07       "Private network"  49329    6        0x43       5        "faf-exercise.pcap5"

Conclusion

Don’t forget to reset FNL_IDX to its default value:

t2conf fnameLabel -D FNL_IDX=0 && t2build fnameLabel

Or use the new command:

t2conf --reset fnameLabel

Have fun!