Parsing YAML in Python: A Crawl, Walk, Run Approach

I consider myself inexperienced when it comes to parsing data in Python. A quick Google search yields a ton of examples on how to parse a simple dictionary, or list, but the reality is that the data for network automation is much more complex. It’s often a dictionary…of dictionaries…and even perhaps of dictionaries…with an embedded list or two thrown in for confusion. I always spend way too much time struggling to get the specific values from the results. If I need multiple results for multiple devices I usually end up under my desk in the fetal position frustrated, mad, dejected and likely crying. It’s a skill that I don’t have to do often enough to dedicate the time and energy to fully understanding it, but alas now is the time.

Other examples I’ve seen online try to explain these concepts using data sets of birds, fruits, vegetables, etc. but as a network engineer you likely don’t work with those things daily. I’m going to walk through four YAML data files and containing information about network devices. This is directly applicable to most any inventory file, or config generation you will encounter whether you’re using Ansible, Jinja2, Nornir, NAPALM, or straight Python. I’ll start simple, and progress to more complex structures. My examples are by no means the only way to iterate through data, but it’s what makes sense to me. My hope is that this helps you understand how to iterate through data in a clear and concise manner, without your coworkers having to coax you out from under your desk.

All of the files and script info used in this post can be found on Github if you want to use them for yourself.

Crawl: Basic Dictionary

Ok, let’s start with a simple YAML file that is nothing more than three devices. You know it’s a YAML file because it starts with ‘—‘. YAML files contain key-value pairs K:V, separated by a colon (:). In this case there are three key-value pairs…the key for each pair is the device name, and there are no values currently (more on that in a minute).

[ex1.yaml]

---
router1:
router2:
switch1:

Now, we want to take the contents of the file ex1.yaml and get them into Python so we can do something useful (i.e. sort, filter, use as an input for a template, etc). The easiest way to do this is to run a simple script that stores the contents of the file in a Python variable called ‘result’.

[yaml_import.py]

# Imports the YAML module for use in our script
import yaml

# Opens the file ex1.yaml, and loads the contents in the variable 'result'
with open('ex1.yaml') as f:
    result =  yaml.safe_load(f)

We now have a variable called ‘result’. Let’s check a couple of things. First, let’s see the contents of the variable:

print(result)
{'router1': None, 'router2': None, 'switch1': None}

So what do we have here? We have a single dictionary, which we can tell by the single set of curly brackets { } surrounding the output. The dictionary consists of three key:value pairs. The first pair has a key value of ‘router1’ and a value of None. The second pair has a key value of ‘router2’ and a value of None. The third and final pair has a key value of ‘switch1’ and a value of None. One thing to note in this example is the None value. In Python, None is an object with the datatype of class NoneType, which is different than an object with the datatype of class string with nothing in it i.e. ”. Not super important in our case, but worth noting.

Let’s check the type of the result variable to verify it’s a dictionary:

print (type(result))
<class 'dict'>

Yep, it’s a dictionary. So far so good. One more thing to mention about dictionaries since we are still in a simple example is that the key:value pairs in a dictionary are UNORDERED. This means that when looking at dictionary values, they may not always be in the same index position. This is in contrast to lists, which are ordered items starting at index 0. Lists are defined by the use of square brackets [ ] and contain only single items separated by commas.

Now that we understand the basics of a dictionary let’s look at a different source file with key:value both defined.

[ex2.yaml]

---
router1: 10.1.1.1
router2: 10.1.1.2
switch1: 10.10.10.50

Ex2.yaml is similar to the first file except the values are now the IP addresses of each of the keys (devices) instead of None. Looking at the variable in Python shows a very similar output and type:

print(result2)
{'router1': '10.1.1.1', 'router2': '10.1.1.2', 'switch1': '10.10.10.50'}

print (type(result2))
<class 'dict'>

Now that we have a dictionary of key:value pairs representing devices and IP address, how do we do something with that data? There are several methods to interact with dictionaries. For working with device and inventory info for network automation, we will be focusing specifically on .items(), .key(), .values(), and .get(<key>). To see a full list of dictionary methods available see https://realpython.com/python-dicts/#dgetltkeygt-ltdefaultgt.

While using the methods below allow us to interact with dictionaries, Python 3 uses something called dictionary views to provide dynamic views of the underlying objects. This isn’t super important, but it does mean that the results are listed as class type ‘dict_<items|keys|values>’, which may be confusing.

Dictionary View Examples:

print(type(items))
<class 'dict_items'>

print(type(keys))
<class 'dict_keys'>

print(type(values))
<class 'dict_values'>

To that end, I’ve enclosed the method calls with the list() syntax to make them regular lists.

List Examples:

print(type(items))
<class 'list'>

print(type(keys))
<class 'list'>

print(type(values))
<class 'list'>

.items()

This method returns a list of key:value pairs in a dictionary. Note that this is a list, which is ORDERED and is noted by square brackets [ ]. In our case, each list item is also a tuple which is immutable (not changeable) and is noted by parenthesis ( ).

items = list(result2.items())

print(items)
[('router1', '10.1.1.1'), ('router2', '10.1.1.2'), ('switch1', '10.10.10.50')]

print(type(items))
<class 'list'>

Let’s print out a switch1’s device name and it’s corresponding IP. This will require us to print out the tuple values in the index 2 position of the list. Indexing always starts at 0 and since switch1 is the third item in the list you count 0…1…2. You can then reference each item by it’s tuple position 0 for name and 1 for IP.

print(items[2][0] + " has an IP address of " + (items[2][1]))

switch1 has an IP address of 10.10.10.50

.keys()

This method returns a list of the keys in a dictionary. In our case, it’s the device list.

keys = list(result2.keys())

print(keys)
['router1', 'router2', 'switch1']

print(type(keys))
<class 'list'>

We could count the number of devices in our inventory:

count = 0
for device in keys:
    count += 1
print("Device Count: " + str(count))

Device Count: 3

.values()

This method returns a list of the values in a dictionary. In our case, it’s a list of IP addresses.

values = list(result2.values())

print(values)
['10.1.1.1', '10.1.1.2', '10.10.10.50']

print(type(values))
<class 'list'>

We could print the first value of the list:

print(values[0])
10.1.1.1

.get(<key>)

This method returns the value for a key if it exists in the dictionary. If it doesn’t exist it will return None. You can also pass the default argument which states what to return if it doesn’t exist. In our example, we can look for a device, and get the corresponding IP address.

get = result2.get('switch1')

print(get)
10.10.10.50

print(type(get))
<class 'str'>

So far we’ve seen how to extract all keys (device list in our case), all values (IP address list in our case), and values based on key value (IP for a device) from a simple dictionary. Not overly useful alone, but it opens the door for more useful operations. What about if we wanted to generate config to change the device name for all devices and append our domain name for standardization? Instead of the repetitive and error-prone copy/paste, or Notepad++ jujitzu needed to accomplish this, you could simply create a small for loop using the results of the device list (stored as ‘keys’) from above.

for device in keys:
    print("hostname " + device + ".domain.local")

hostname router1.domain.local
hostname router2.domain.local
hostname switch1.domain.local

We now have actual config. We could even take it a step further and breakup the commands. Let’s say you were providing instructions to an engineer to perform manually. You could give them a script like the following:

for device in keys:
    print(device + " config template")
    print("##################################################")
    print("config t")
    print("hostname " + device + ".domain.local")
    print("end")
    print("copy run start")
    print("exit")
    print("##################################################")

router1 config template
##################################################
config t
hostname router1.domain.local
end
copy run start
exit
##################################################
router2 config template
##################################################
config t
hostname router2.domain.local
end
copy run start
exit
##################################################
switch1 config template
##################################################
config t
hostname switch1.domain.local
end
copy run start
exit
##################################################

While this is a very simple example, it shows:

We can import data from YAML as a dictionary and iterate through it
We can do something useful with the data

A better option would be to store each of the device configs in a variable and use automation to connect to the devices and deploy the config changes accordingly. Better yet, check to see if the config is already deployed, and only change the hostname if it doesn’t match what we want, but I digress.

Before we move on to the next data file, let’s look at a couple more use cases. Let’s check to see if an IP address is in our inventory (values).

ip = "10.1.1.1"
print(ip in result2.values())
True

ip = "10.255.255.254"
print(ip in result2.values())
False

We can also create a report:

print("###########################")
print("Device Inventory Report")
print("###########################\n")
for device,ip in result2.items():
    print("Device Name: " + device + "\n")
    print("  IP Address: " + ip + "\n\n")
print("##################################")
print("End of Inventory")

###########################
Device Inventory Report
###########################
Device Name: router1
  IP Address: 10.1.1.1

Device Name: router2
  IP Address: 10.1.1.2

Device Name: switch1
  IP Address: 10.10.10.50

###########################
End of Inventory

Hopefully you get the idea of what’s possible and there are a ton of tasks you could perform. We are now ready to add another level of complexity. If you want to take a break, get something to drink, buy that thing on Amazon, now would be a good time.

Walk: Nested Dictionaries

Ok, time to level up our complexity. Let’s look at ex3.yaml.

[ex3.yaml]

---
router1:
  site: atlanta
  mgmt_ip: 10.1.1.1

router2:
  site: chicago
  mgmt_ip: 10.1.1.2

switch1:
  site: atlanta
  mgmt_ip: 10.10.10.50

What’s new in this file? Well, we added site as a key for each device. We also moved the IP from being the value in the key:value pair from our previous example to it’s own key:value pair. Take note of the indentation in the YAML file. YAML exclusively uses spaces…NOT TABS! The convention is two spaces for nesting items and it’s what I’ve used. Let’s take a look at the ‘result3’ variable.

{'router1': {'site': 'atlanta', 'mgmt_ip': '10.1.1.1'}, 'router2': {'site': 'chicago', 'mgmt_ip': '10.1.1.2'}, 'switch1': {'site': 'atlanta', 'mgmt_ip': '10.10.10.50'}}

<class 'dict'>

You can see that we still have a dictionary with each key being the device name. The values, however, are also a dictionary containing two key:value pairs with site and mgmt_ip as the keys. Adjusting the text shows the hierarchy a little better:

{
'router1': {'site': 'atlanta', 'mgmt_ip': '10.1.1.1'},
'router2': {'site': 'chicago', 'mgmt_ip': '10.1.1.2'},
'switch1': {'site': 'atlanta', 'mgmt_ip': '10.10.10.50'}
}

Adding more whitespace, you can see it starts to look very similar to the original YAML. The point is to show that the data is indeed structured. This is often a good exercise to do when you are dealing with nested data, as you can get a much better understanding of the structure by adding whitespace. You can do this manually, or you can use a site like https://onlineyamltools.com/prettify-yaml. Simply paste the text in the left, and it will spit out a pretty version indented if it’s valid.

{
'router1': 
  {
  'site': 'atlanta', 
  'mgmt_ip': '10.1.1.1'
  }
'router2': 
  {
  'site': 'chicago', 
  'mgmt_ip': '10.1.1.2'
  }
'switch1': 
  {
  'site': 'atlanta', 
  'mgmt_ip': '10.10.10.50'
  }
}

When I see nested dictionaries, the first thing that pops into my head is that I need an ordered list of the keys in the outer dictionary. In our case, it’s a list of devices…which we already know. We can refer back to our earlier code that created a list of key values of our dictionary.

keys = list(result3.keys())

print(keys)
['router1', 'router2', 'switch1']

print(type(keys))
<class 'list'>

Now the magic happens. We have a list of our dictionary keys defined in the variable ‘keys’, so let’s create a report by looping through each item in the list, and referencing it to pull the site and mgmt_ip values out of ‘results3’.

for device in keys:
    print("Device Name: " + device)
    print("  Site: " + result3[device]['site'])
    print("  MgmtIP: " + result3[device]['mgmt_ip'])

Device Name: router1
  Site: atlanta
  MgmtIP: 10.1.1.1
Device Name: router2
  Site: chicago
  MgmtIP: 10.1.1.2
Device Name: switch1
  Site: atlanta
  MgmtIP: 10.10.10.50

We can also use this to pull out individual values. Here’s an example if we type in a device name and it returns the site and mgmt_ip values.

print('Enter a device:')
dev_input = input()
print("The device " + dev_input + " is located in " + result3[dev_input]['site'] + " and has a management IP of " + result3[dev_input]['mgmt_ip'])

Enter a device:
router2
The device router2 is located in chicago and has a management IP of 10.1.1.2

Next, let’s get a list of unique sites from all of our devices. In this case, we will create a new unique list of sites. I have added an IF statement, created an initial blank list called all_sites, and iterate through each device to find its site. If the site isn’t in the all_sites list it appends it to the end.

all_sites = []
for device in keys:
  if result3[device]['site'] not in all_sites:
      all_sites.append(result3[device]['site'])

print(all_sites)
['atlanta', 'chicago']

Let’s recap what we’ve accomplished in this section. We imported YAML into Python which resulted in nested dictionaries. We created a list of values that referred to dictionary keys; and we used those values to extract specific values within the nested dictionaries. We also created another new list of items by iterating through the site key inside of the device dictionaries, inside the ‘result3’ dictionary.

Run: More Nesting of Dictionaries and Lists

Now that we have a good understanding of how YAML data can be structured and we know how to iterate through it we can now look at a more complex example. Ex4.yaml is a much more realistic example of what a YAML file would look like for network devices.

[ex4.yaml]

---
router1:
  site: atlanta
  dns:
    - dns_pri: '1.1.1.1'
    - dns_sec: '2.2.2.2'
  interfaces:
    GigabitEthernet1:
      description: Management
      ipv4adr: 10.1.1.1
    GigabitEthernet2:
      description: TO ROUTER2
      ipv4addr: 10.10.10.1

router2:
  site: chicago
  dns:
    - dns_pri: '3.3.3.3'
    - dns_sec: '4.4.4.4'
  interfaces:
    GigabitEthernet1:
      description: Management
      ipv4adr: 10.1.1.2
    GigabitEthernet2:
      description: TO ROUTER1
      ipv4addr: 10.10.10.2

switch1:
  site: atlanta
  dns:
    - dns_pri: '1.1.1.1'
    - dns_sec: '2.2.2.2'
  interfaces:
    GigabitEthernet1:
      description: Management
      ipv4adr: 10.1.1.50

Let’s see what we’re dealing with. First, we still have our devices at the outermost level. We also kept the site key:pair as it was. We have added a section for DNS servers to be defined, and they are different depending on the site. Note that the DNS servers are indented in the YAML file, but also have a ‘-‘ which indicates a list member. Finally, we have added a section for interfaces that includes the interface name with description and ipv4 address as key:value pairs. Alright, let’s see what ‘result4’ looks like.

{'router1': {'site': 'atlanta', 'dns': [{'dns_server1': '1.1.1.1'}, {'dns_server2': '2.2.2.2'}], 'interfaces': {'GigabitEthernet1': {'description': 'Management', 'ipv4adr': '10.1.1.1'}, 'GigabitEthernet2': {'description': 'TO ROUTER2', 'ipv4addr': '10.10.10.1'}}}, 'router2': {'site': 'chicago', 'dns': [{'dns_server1': '2.2.2.2'}, {'dns_server2': '1.1.1.1'}], 'interfaces': {'GigabitEthernet1': {'description': 'Management', 'ipv4adr': '10.1.1.2'}, 'GigabitEthernet2': {'description': 'TO ROUTER1', 'ipv4addr': '10.10.10.2'}}}, 'switch1': {'site': 'atlanta', 'dns': [{'dns_server1': '1.1.1.1'}, {'dns_server2': '2.2.2.2'}], 'interfaces': {'GigabitEthernet1': {'description': 'Management', 'ipv4adr': '10.1.1.50'}}}}

Adding whitespace shows us that everything is valid. The output below is for router1 only as the rest have been omitted for brevity.

{
'router1': 
  {
  'site': 'atlanta', 
  'dns': 
    [
    {'dns_server1': '1.1.1.1'}, 
    {'dns_server2': '2.2.2.2'}
    ], 
  'interfaces': 
    {
    'GigabitEthernet1': 
      {
      'description': 'Management', 
      'ipv4adr': '10.1.1.1'
      }, 
    'GigabitEthernet2': 
      {
      'description': 'TO ROUTER2', 
      'ipv4addr': '10.10.10.1'
      }
    }
  }
...
}

As in earlier examples, let’s start by creating a list of dictionary keys that we can reference as needed later for the devices.

devices = list(result4.keys())

print(devices)
['router1', 'router2', 'switch1']

print(type(devices))
<class 'list'>

All we need now is to iterate through the results using the devices in the list as iterators and we can create a device list with all interfaces, descriptions, and IPs.

for device in keys:
    print("###################")
    print(device)
    for interface in result4[device]['interfaces']:
        print("\n" + interface)
        print("------------------")
        print("  " + result4[device]['interfaces'][interface]['description'])
        print("  " + result4[device]['interfaces'][interface]['ipv4addr'])
    print("###################")

###################
router1

GigabitEthernet1
------------------
  Management
  10.1.1.1

GigabitEthernet2
------------------
  TO ROUTER2
  10.10.10.1
###################
###################
router2

GigabitEthernet1
------------------
  Management
  10.1.1.2

GigabitEthernet2
------------------
  TO ROUTER1
  10.10.10.2
###################
###################
switch1

GigabitEthernet1
------------------
  Management
  10.1.1.50
###################

There are two important things to note in the example above.

We are utilizing nested FOR loops. We need to do that because we are iterating through the device (for device in keys) and then we are iterating through each interface (for interface in result4[device][‘interfaces’]) of each device.
Each lookup into the next nested level needs to be exact. Notice that [‘interfaces’] contains single quotes which means it will match that text exactly. However [interface] has no quotes which means it’s referencing the FOR loop iteration variable, which is the current device having its interfaces iterated over. Depending on the structure of the YAML file you are working with, you will need to adjust your syntax accordingly.

Similarly, we can pull out individual values based on input as well. For example, to get an interface count of a device you could do the following.

print('Enter a device:')
dev_input = input()
count = 0
for interface in result4[dev_input]['interfaces']:
    count += 1
print(dev_input + " has " + str(count) + " interface(s)")

Enter a device:
router1
router1 has 2 interface(s)

Enter a device:
switch1
switch1 has 1 interface(s)

We have one last item to cover and it involves lists nested in a nested dictionary. This is very common when looking at multiple values for a given setting. I’ve seen it used for identifying BGP peers, interfaces, NTP servers, and several other options. For our example, let’s start by getting the primary DNS server for router2.

device = 'router2'
print(result4[device]['dns'][0]['dns_pri'])

3.3.3.3

How exactly did that work? Well, we looked at ‘result4’ and found the device router2. We then looked into the dns section of router2 which is a list that looks like this.

[{'dns_pri': '3.3.3.3'}, {'dns_sec': '4.4.4.4'}]

print(type(result4[device]['dns']))
<class 'list'>

The primary DNS server is at the index 0 and the secondary DNS server is at index 1. Since we want the primary DNS server we are going to pull index 0. In our case, index 0 in the list is another dictionary.

{'dns_pri': '3.3.3.3'}

print(type(result4[device]['dns'][0]))
<class 'dict'>

Finally, we want the value for dns_pri so we add [‘dns_pri’] and we get the value 3.3.3.3.

Our final example will create standardized configs based on the YAML file data. I recommend using Jinja2 as an input of the data to create templates, but since that’s outside of the scope of this article we’ll generate the config in Python. I am using a subnet mask of 255.255.255.0 for consistency. I could easily add mask:<value> as a key:value pair to the YAML file to include this data as well.

print("########Standardized Config Generator########")
for device in keys:
    print("--------Currently on Device " + device + " --------")
    print("snmp-server location " + result4[device]['site'])
    print("ip domain-lookup")
    print("ip name-server " + result4[device]['dns'][0]['dns_pri'] + " " + result4[device]['dns'][1]['dns_sec'])
    for interface in result4[device]['interfaces']:
        print("interfrace " + interface)
        print("  description " + result4[device]['interfaces'][interface]['description'])
        print("  ip address " + result4[device]['interfaces'][interface]['ipv4addr'] + " 255.255.255.0")
    print("\n")

########Standardized Config Generator########
--------Currently on Device router1 --------
snmp-server location atlanta
ip domain-lookup
ip name-server 1.1.1.1 2.2.2.2
interfrace GigabitEthernet1
  description Management
  ip address 10.1.1.1 255.255.255.0
interfrace GigabitEthernet2
  description TO ROUTER2
  ip address 10.10.10.1 255.255.255.0

--------Currently on Device router2 --------
snmp-server location chicago
ip domain-lookup
ip name-server 3.3.3.3 4.4.4.4
interfrace GigabitEthernet1
  description Management
  ip address 10.1.1.2 255.255.255.0
interfrace GigabitEthernet2
  description TO ROUTER1
  ip address 10.10.10.2 255.255.255.0

--------Currently on Device switch1 --------
snmp-server location atlanta
ip domain-lookup
ip name-server 1.1.1.1 2.2.2.2
interfrace GigabitEthernet1
  description Management
  ip address 10.1.1.50 255.255.255.0

I hope these walkthroughs help you to better understand how to work with dictionaries, lists, YAML, and Python. The focus was on the data, how it is collected, and how to do something useful with it whether it’s used for reporting, configuration, or something else. With practice you’ll be able to slice through dictionaries, lists, and tuples (though not strings, as that’s a different skill) containing network information like a boss. You will find a lot of value in using the data as an input to other tools for further automation.

One thought on “Parsing YAML in Python: A Crawl, Walk, Run Approach”

Leave a comment Cancel reply