Why are you learning Python if you haven't needed it before?

Many of the tools I use have Python embedded right in there and I need to do things that the tools don't do natively.

I can read Python and understand it, but not really write it from scratch without stumbling through and reading lots of docs, StackOverflow, and GeeksForGeeks.

I need to manipulate some data using Alteryx that comes from an API call. For reasons that are dumb, but insurmountable, Alteryx cannot call this API directly.

Alteryx has Python embedded and even includes an embedded Jupyter IDE. I have Python code that makes the API calls and pulls down the data.

The only way to get the data out of the Python blob is using a pandas datastream. Any field that was a list or dictionary in the original data, is just a string when it comes out in the same pickle format you'd get if you did a print from Python.

Alteryx doesn't understand pickle format. It barely understand JSON. But I need to convert pickle to JSON and I cannot find a reliable way to do that just using character substitution and regular expressions.

I can't get the data out of Python in JSON format unless you know how to force pandas to represent the data as JSON instead of native Python dict and list. Alteryx properly converts the other datatypes to Alteryx-native datatypes, just not those.

There are tools in Alteryx to unnest objects and flatten arrays but they have to be valid JSON.

When I try just replacing ' with ", None with null, True with true, and False with false, I get errors which I assume are from replacing an embedded or escaped '.

Alteryx error messages are terrible. The input is 18,000 lines, each one a dict to be parsed. I convert them to what I think is valid JSON and send them to the JSON parser which returns this error Missing comma or '}' after an object member. at character position: 170. Not which line it happened on. Not any context where I can search or find it. Just that.

Would anyone have a reliable way to convert the strings of pickle to valid JSON?

#Python #Alteryx #JSON #PCRE #regex #Pickle

It was this.

"[concat(parameters('vmname'), '/', parameters('extensionname'))]"

Still, if anyone knows a cool and good way to detect single quotes embedded in a set of doublequotes and then escaping them using just pcre, that would be cool.

@jrdepriest You'll have to check this in Python, but in JavaScript strings and PCREs seem to allow arbitrary characters to be escaped using \x followed by the hexadecimal for the character.

$ node
Welcome to Node.js v18.19.1.
Type ".help" for more information.
> "\x27"
"'"
> /\x27/.test("a'b")
true
> /\x27/.test("ab")
false
>

@coderCyclist

I was able to get it working in Alteryx by using two regexes.

First REGEX_Replace([data], "'([^']*)': '([^']*)'", '"$1": "$2"') and then REGEX_Replace([data_parsed], "'([^']*)': "+'"([^"]*)"', '"$1": "$2"')

Edit: I didn't know I could use the + for string concatenation inside a REGEX_Replace command until today. That will come in handy.

@jrdepriest That's not in Pickle (Python-specific serialization) format. From a quick web search, looks like an Azure Resource Management template string. Maybe Azure has some generic helpers to deal with those?

https://learn.microsoft.com/en-us/azure/azure-resource-manager/templates/template-expressions

Template syntax and expressions - Azure Resource Manager

Describes the declarative JSON syntax for Azure Resource Manager templates (ARM templates).

@tero You are correct. The overall data is pickle but the API is returning that Azure string in lieu of an actual hostname for a single host out of over 18,000 with hundreds of other Azure VPCs. I don't know why it only screws up that one host.

@jrdepriest the quotes thing is likely not a regular expression but something deeper in the Chomsky hierarchy.

But, if you have dicts and lists, you can use json.dumps to output them as… something resembling JSON. It needs some coercing to avoid illegal output for some values:

import json #[…] def python_json_module_sucks(ignored): raise StopIteration(-1) #[…] def read_json(inputfilename): with open(inputfilename) as fh: object = json.load(fh, parse_constant=python_json_module_sucks) return object def write_json(outputfilename, object): with open(outputfilename, 'w') as fh: json.dump(object, fh, allow_nan=False, indent=2, sort_keys=True) fh.write('\n')

@mirabilos unfortunately, the Python embedded in Alteryx can only send data back to the main program as a pandas data frame. I have used json.dumps to get data out in JSON when it's not embedded in Alteryx but I can't do that here.

There's always something convoluted like converting from Pickle to JSON, then concat and convert the entire row into a string, then base64 encrypting it before sending it out as a data frame. Once back in Alteryx, I could base64 decrypt each row string, split it back out to columns, then send it to the JSON parser.

I've done that before, too; just not in Alteryx.

@jrdepriest eurgs.

Sounds like this needs an interactive debugging session; without knowing Alteryx or having that oneself this seems hard to do “dry”.

(I’d likely find something that does useful translations on both sides before the conversion to Pandas and back trashes lists and dicts; the Python str() representation is not good, and while repr() is (if it uses that), just evaling things is a recipe for disaster and “bobby tables”.)

@mirabilos LOL yeah at that last part, "unintended consequences".

It's funny that their official reply to these missing features in the main Alteryx program is for you to use the embedded Python, but Alteryx can't interpret Python lists or dicts, only single values with one datatype. That severely limits its usefulness.

Alteryx is great, except when it isn't. I have a habit of finding the borders of usefulness in every application I use. Somehow, over and over, I'm repeatedly the first person to ever ask about doing X.

I wouldn't need to go down this rabbit hole in the first place if Alteryx let me customize the TLS and certificate configuration for how it makes API calls. The only reason I have to use Python is because the native download widget can't properly negotiate the HTTP encryption settings for the SaaS platform that holds the data I need. Every other tool can: PowerShell, Python, curl, Postman, Bruno, Databricks, our SOAR. Only Alteryx can't. I almost think it might be easier to dive in and learn Python better and do everything there but I would rather use the old Alteryx workflows that have been chugging along successfully with the old data source for three years.