DadOverflow.com

Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Page 13 of 57

Python and SOAP

Simple Object Access Protocol, or SOAP, was the new hotness in web service technology…some 15 or 20 years ago. It was built around XML, Web Service Definition Language (WSDL), XML namespaces and other complex ideas.

Today’s web service standard is Representational state transfer (REST)–a much simpler approach to data transmission over the web. Instead of trading around clunky XML files, REST APIs typically leverage sleeker JSON (JavaScript Object Notation) documents in their communication.

But some things never die and, recently, I found myself elbow-deep into a number of SOAP APIs while trying to pull data from a vendor product. I wrote a Python client to interface with those APIs. While Python has a number of packages designed to work with the technology, I wanted to stick with just the requests package to keep my dependencies minimal. Ultimately, my client worked well and I wanted to share a few tidbits here that I learned along the way to get my requests code to successfully call SOAP web services.

1. The header can be tricky

Getting your header right is critical to successful service calls. I found two header elements essential for my code: SOAPAction and Content-Type. It was important that I set SOAPAction to a url corresponding with the particular web method I wished to call. The vendor documentation was pretty important here to determine what that url should be.

What’s interesting about Content-Type is that the web is full of valid suggestions for the proper value: text/xml and application/soap+xml are two that I’ve seen bandied about. In my case, neither value worked. Again from the vendor documentation, the value that made my calls work was application/x-www-form-urlencoded. So, my header dictionary looked roughly like this:

headers = {'SOAPAction': 'http://somesite.com/webservices/SomeMethod', 'Content-Type': 'application/x-www-form-urlencoded'}

2. The post data doesn’t necessarily need to be XML

Crazy notion, right? Posting non-XML to a SOAP API? Early on in my work, I kept trying to format all my post arguments into a single XML document and tried to push that document to the web method with my requests call, but the code would never work. At some point, I stumbled upon a forum or discussion thread where one of the participants posted code that actually used a dictionary for his post data object–what you would normally do with a REST API. I was taken aback but gave it a go and, to my astonishment, it worked! Some web methods required simple parameters like strings and integers, ready made for Python dictionaries. A few did have a parameter or two of XML. For those, I simply had to push a string representation of a properly formatted XML document. My code looked something like this:

import requests

str_xml = '<some_doc><some_elem>1</some_elem></some_doc>'
post_data = {'token': 'blah', 'search_xml': str_xml}
ws_url = 'https://somesite.com/ws/something.asmx/SomeMethod'
resp = requests.post(ws_url, data=post_data, headers=headers)

3. Parsing the XML response can be tricky

I “believe” the most appropriate way to deal with XML responses in the response object is through the content property. But, since the response is supposed to be XML, I wanted to run the content through ElementTree to get a proper XML document I could more easily process. In my early attempts, I passed the content value to ElementTree’s fromstring function to get back a proper XML document that I can process like any other XML document. Or so I thought.

The rub is that fromstring returns an XML element, not an XML document. You have to add one more line, a call to the ElementTree constructor itself, you get the proper XML document object you can use in the rest of your code. My response processing code then looked like this:

import xml.etree.ElementTree as ET


resp = requests.post(ws_url, data=post_data, headers=headers)

resp_elem = ET.fromstring(resp.content)
resp_doc = ET.ElementTree(resp_elem)

# now, you can use functions like find and findall with the resp_doc object

So, the next time you find yourself having to work with SOAP APIs–and hopefully you don’t–there are some handy tips and tricks to consider.

Learning Guitar with Python, Part 3

Past posts in this sorta-series: Part 1 and Part 2.

Pentatonic scales are quite popular among guitar players. If you’re rusty on your Greek, “Penta” means “five”. Thus, these scales only have five notes instead of seven–the fourth and seventh notes are dropped from their modal counterparts.

The major and minor pentatonic scales that I learned are truncated versions of the Ionian and Aeolian modal scales, respectively. My guitar teacher also showed me a longer form of both the major and minor pentatonic scales: scales that traverse more frets than the normal four-fret span. Furthermore, the long-form minor scale includes a feature called “The House,” where the notes on the fretboard resemble a house:

Bad rendering of how notes in the minor pentatonic scale resemble a house

“The House” is, apparently, a popular shape used by lots of famous guitar players.

To help me learn these scales, as I have with previous scales, I coded them up in Python in a Jupyter notebook and displayed them in HTML. Here’s the code I wrote:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display_html

%matplotlib inline

major_pentatonic = {'Low E':['','1','','2'], 'A':['3','','','5'], 'D':['6','','','1'], 'G':['2','','3',''], 
                    'B':['','5','','6'], 'High E':['','1','','2']}
minor_pentatonic = {'Low E':['6','','','1'], 'A':['2','','3',''], 'D':['5','','6',''], 'G':['1','','2',''], 
                    'B':['3','','','5'], 'High E':['6','','','1']}
major_pent_long = {'Low E':['1','','2','','3','','',''], 'A':['','','5','','6','','',''], 
                   'D':['','','1','','2','','3',''], 'G':['','','','','5','','6',''], 
                   'B':['','','','','','1','','2'], 'High E':['','','','','3','','','5']}
minor_pent_long = {'Low E':['5','','6','','','','',''], 'A':['1','','2','','3','','',''], 
                   'D':['','','5','','6','','',''], 'G':['','','1','','2','','3',''], 
                   'B':['','','','','','5','','6'], 'High E':['','','','','','1','','2']}

pent_scales = {'Major Pentatonic':major_pentatonic, 'Minor Pentatonic':minor_pentatonic, 
               'Major Pentatonic (Long)':major_pent_long, 'Minor Pentatonic (The House)': minor_pent_long}
pent_html = '<h3>Pentatonic Scale Shapes</h3>'

for i, scale_name in enumerate(pent_scales):
    nbr_of_frets = len(pent_scales[scale_name]['Low E'])
    df_scale = pd.DataFrame(pent_scales[scale_name], index=np.arange(1, nbr_of_frets+1))
    # https://stackoverflow.com/a/50899244
    df_scale_styler = df_scale.style.set_table_attributes("style='display:inline'").set_caption(scale_name)
    pent_html += df_scale_styler._repr_html_() + '&nbsp;&nbsp;&nbsp;'
    if (i+1) % 2 == 0:
        pent_html += '<p></p>'

display_html(pent_html, raw=True)

And that code rendered this graphic:

Pentatonic scales rendered in a Jupyter notebook

So now I can show my notebook in a computer monitor and play along to the scale.

More to come!

Parsing Oddly Formatted Spreadsheets

Python and pandas works well with conventionally formatted spreadsheets like this:

Conventional spreadsheet easily parsed in Python

But how do you deal with spreadsheets formatted in unconventional ways, like this?

Can you use pandas to parse an oddly formatted spreadsheet?

Here’s my approach to massaging this data into a dataframe I can work with.

Step 1: Go ahead and read in the wonky spreadsheet

Go ahead and read in the spreadsheet, warts and all, into a dataframe. I went ahead and skipped rows 0 and 1 as they were unnecessary:

import pandas as pd

df_raw = pd.read_excel('./data/odd_format.xlsx', skiprows=1).fillna('')

As you’d expect, the results are not very pretty:

Step 2: Figure out where each record starts and stops

Looking at the spreadsheet, I determined that each record starts with a field named “First Name:” and ends with a field named “State:”. If I can put together a list of row indexes that lets me know where each record begins and ends, I should be able to iterate through that list and reformat each record uniformly. Pandas filtering can help with that. To get a list of each “start” row, I can use this code:

df_raw[df_raw['Unnamed: 1']=='First Name:'].index.tolist()

To get a list of each “end” row, I can do this:

df_raw[df_raw['Unnamed: 1']=='State:'].index.tolist()

Finally, I can use Python’s handy zip function to glue both together in a list of tuples that I can easily loop through:

for start_row, end_row in zip(df_raw[df_raw['Unnamed: 1']=='First Name:'].index.tolist(), df_raw[df_raw['Unnamed: 1']=='State:'].index.tolist()):
    # loop through each record

Step 3: Collect all key/value pairs per record

Now that I’m able to iterate over each record, I need to be able to capture each key/value pair in each record: each person’s first name, middle name (if available), last name, etc. I can use Python’s range function to loop from the starting row to the ending row of the record and pandas iloc function to zero in on each key and associated value:

person = {}  # I need some place to store the keys/values, so let's use a dictionary
for i in range(start_row, end_row+1):
    k = df_raw.iloc[i, 1]  # the keys are in column 1
    v = df_raw.iloc[i, 2]  # the values are in column 2

Each record has an empty row in the middle of it, separating “name” properties from “address” properties. I don’t need those empty rows, so I do a quick check before writing the keys and values to my dictionary object:

if len(k.strip()) > 0:
    person[k.strip().replace(':', '')] = v

Of course, I need to be writing each of these person objects to a master list, so I do that by appending each object:

people_list.append(person)

Step 4: Create a new dataframe from the people list

Finally, I can take that clean list of dictionaries and generate a new dataframe from it:

df_clean = pd.DataFrame(people_list)

Which renders a nice dataframe from which I can start my analysis:

That’s a little more like it!

So, putting it all together, my full code looks like this:

import pandas as pd

df_raw = pd.read_excel('./data/odd_format.xlsx', skiprows=1).fillna('')
people_list = []
for start_row, end_row in zip(df_raw[df_raw['Unnamed: 1']=='First Name:'].index.tolist(), df_raw[df_raw['Unnamed: 1']=='State:'].index.tolist()):
    person = {}
    for i in range(start_row, end_row+1):
        k = df_raw.iloc[i, 1]
        v = df_raw.iloc[i, 2]
        
        if len(k.strip()) > 0:
            person[k.strip().replace(':', '')] = v
            
    people_list.append(person)
    
df_clean = pd.DataFrame(people_list)

So, should you encounter similarly unconventionally formatted spreadsheets in the future, hopefully this code will help you find a solution to deal with them!

« Older posts Newer posts »

© 2025 DadOverflow.com

Theme by Anders NorenUp ↑