Extract CLI commands from Session Logs

2016-08-20

Today, I like to write about a small and very basic topic. I think everybody requires from time to time the output of certain CLI commands on a set of devices, e.g. for troubleshooting. Quite often, no direct access to the devices is possible, therefore you need to ask somebody else to collect the data. In many cases, you just get plain session logs from e.g. putty and they are in many cases quite unstructured. Working with the data is in such cases quite hard.

I’ll like to show you today, how to normalize this type of unstructured data using a simple python script. It doesn’t matter if you work programmatic or by hand with this data afterwards, a clear structure is always very useful. As always, the code example from this post is part of my python script examples repository on GitHub.

The raw data and the expected output

Lets take a look on our input data. For demonstration purpose, I collect various commands from three of my lab switches using putty. I just enabled the session logs in the settings and add the hostname to the filename (when using putty it is &H). Furthermore, putty should only log printable output. This is required to avoid some encoding issues when you try to split the configuration files. The following screenshot shows the putty configuration.

Now you can login to the switch, but before you start to collect data disable the paging function and extend the terminal width to make the length of a line more predictable. Use the following commands on Cisco IOS/IOS XE:

# terminal length 0
# terminal width 512

By default, Cisco IOS don’t log anything to a terminal session via IP, therefore we don’t need to consider it while collecting the output. Now you can execute the required commands in your putty session. The output is automatically logged to the specified directory and the result should look similar to the following screenshot:

Split the log files using regular expressions

Now let’s have a look on the function that we use to split the content of the log files. We will use regular expressions in this case. I assume that a single log file contains only the output of multiple commands and that the default command prompt in Cisco IOS is used. With these assumptions we’re able to split the log files using the following regular expression:

\S+#

The re module of the standard python library defines a split() function that does exactly what we need for this case. We define a function that splits a string (the content of the log file) using our regular expression. The split() method returns a list that contains all parts of the file without the content that matches the regular expression. That’s great, because I need to identify the command and in this case it’s in most cases the first line of the string. The output is anything that follows the first line. Because we are only interested in the output of the commands, we skip any command that doesn’t contain any output. This also omits the use of the autocompletion feature on the CLI. The resulting function looks similar to the following code snippet:

def split_config_file(raw_data):
    """splits multiple outputs from a single configuration file"""
    split_config = re.split("\n\S+#", raw_data)

    commands = dict()
    for single_command in split_config:
        lines = single_command.splitlines()
        # skip the command if no output is provided
        if len(lines) &gt; 1:
            # ensure that only the command is used as key
            if re.match("^\S+#", lines[0]):
                cmd = lines[0].split("#")[1]
            else:
                cmd = lines[0]

            commands[cmd] = "\n".join(lines[1:])

    return commands

The result of the function is a dictionary, that contains the commands as keys and the output as values. In this case, we assume that every file contains only the output for a single host and the command is therefore unique and can be used as a key for the dictionary.

The first test run

Now we start the script for the first time. After the first run, I get the a result with my test data that looks similar to the following:

Okay, looks good but it contains an element that we don’t require. The first match on SwitchA contains the putty logging header which is not a real command. Remember, the split function splits a string using a regular expression. In this case, the first result isn’t a valid command, but it is the first entry in the list. For this reason, we add a special treatment for the first element of the list. If the first element doesn’t start with a prompt, we ignore it.

To make the function more resilient, we add a quick check that the split() function returns anything. The function now looks exactly like the following snippet:

def split_config_file(raw_data):
    """splits multiple outputs from a single configuration file"""
    split_config = re.split("\n\S+#", raw_data)

    commands = dict()
    if len(split_config) &gt; 0:
        # ignore the first element if it doesn't start with the prompt
        if not re.match("^\S+#", split_config[0], re.MULTILINE):
            split_config = split_config[1:]

        for single_command in split_config:
            lines = single_command.splitlines()

            # skip the command if no output is given
            if len(lines) &gt; 1:
                # ensure that only the command is used as key
                if re.match("^\S+#", lines[0]):
                    cmd = lines[0].split("#")[1]
                else:
                    cmd = lines[0]

                commands[cmd] = "\n".join(lines[1:])

    return commands

That’s it. If we execute the script, the result looks similar to the following:

The files now only contain the output and we have a predictable and structured representation of the template.

Work with multiple files

In the last two section I explained, how to split a single log file. In my case I got a lot more and they are stored in multiple directories. For this reason, I wrote a helper function that returns the path to all files with a given extension in a specific directory including all subdirectories. The following code was used to collect these information:

def get_files_in_path(root_dir, only_ext="log"):
    """returns a list with all files from the given directory"""
    files = [os.path.join(dirpath, file)
             for (dirpath, dirnames, filenames) in os.walk(root_dir)
             for file in filenames]

    if only_ext:
        ends_with = only_ext if only_ext[0] == "." else "." + only_ext
        return [file for file in files if file.endswith(ends_with)]

    else:
        return files

I looks more difficult than it is in real world. The major part is the list comprehension expression in the first line. It uses the os.walk() function, that contains a set of tuples with all files in a directory including all subdirectories. List comprehension is a quite powerful syntax element in python and if you don’t know these already, take a look in the official python documentation.

If you take a look on the entire script, you see the following main part that loads all files from a given directory, parses the content and creates the structured output in the format root directory >> hostname >> command.txt. It looks like the following code and is quite easy to understand:

if __name__ == "__main__":
    INPUT_DIRECTORY = "_input"
    OUTPUT_DIRECTORY = "_output"

    files = get_files_in_path(INPUT_DIRECTORY, only_ext="log")

    for file_path in files:
        if not os.path.isfile(file_path):
            print("File not found or no file: %s -- skip it" % file_path)

        else:
            with open(file_path) as f:
                raw_data = f.read()

            # the file name contains only the hostname following an extension
            hostname = os.path.basename(file_path)[:-len(".log")]

            # split the file in a &lt;command&gt;: &lt;content&gt; dictionary
            result = split_config_file(raw_data)

            # write results to the directory
            root_dir = os.path.join(OUTPUT_DIRECTORY, hostname)
            os.makedirs(root_dir, exist_ok=True)
            for command in result:
                with open(os.path.join(root_dir, "%s.txt" % command.strip()), "w+") as f:
                    f.write(result[command])

Further improvement and use-case

I know that this is quite basic topics and I recommend to collect the data from devices automatically e.g. using netmiko or Ansible, but in some cases it’s not possible to gather data from the devices. Now you have a quite useful tool to structure these type of information and I hope it helps somebody. One improvement of the template could be the automatic detection of the hostname, which is in fact the prompt on Cisco IOS, but for my scenario, the file based approach was appropriate.

That’s it for today. Thank you for reading.