Mastodawn

Qwyrdo, Geometer (& β)

Looking for some help creating a dictionary in Python from a CSV.

E.g. going from

"'A', 'B', 'C', 'D'", "[['a', 'b', 'c'], ['d', 'e'], ['f', 'g', 'h']], [['i']], [['jkl']]"

"'E', 'F', 'G'"," [['m']], [['n', 'o']], [['p', 'q'], ['r', 's'], ['t', 'u', 'v']]"

to

{('A', 'B', 'C', ' D') : [['a', 'b', 'c'], ['d', 'e'], ['f', 'g', 'h']] ...,

('E', 'F', 'G') : [['m']], ... }

I can open the file and do a .readlines pass on it fine.

I can strip the line feeds inherent in a CSV using .replace

But I can't figure out how to get the first item of each line of the CSV into a tuple and the second into a list. Just about everything I've tried seems to convert the parts into strings, which I can't figure out how to un-convert.

Any ideas?

Keep in mind I am at best a novice programmer. I've looked at e.g. https://www.geeksforgeeks.org/python-convert-a-list-to-dictionary/ and not been able to figure out how to get from their examples to what I need for this particular project.

Python | Convert a list to dictionary - GeeksforGeeks

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

GeeksforGeeks

Qwyrdo, Geometer (& β)Jan 3, 2023

I've gotten good suggestions so far, but seem to only be exchanging one problem for another and am approaching my wits' end.

Someone suggested using JSON as a data format and using the json module to parse it (along with some code to do just that).

But upon calling json.loads() I'm getting an error complaining that there's no ':' delimiter... where there actually is one!

Data set currently looks like this, without the line breaks and spacing (added here to make the structure more clear):

{
"'A', 'B', 'C', 'D'" :
[[["a", "b", "c"], ["d", "e"], ["f", "g", "h"]], [["i"]], [["jkl"]]]
}

(... plus another similar entry)

I am really confused!

If you choose to help--which would be very much appreciated--please keep in mind that I'm at best a novice programmer. I learn best by example (e.g. code or pseudo-code).

enne 💤Jan 3, 2023

@Qwyrdo I suspect it might be the next similar item you've elided. https://jsonlint.com/ has no complaints about the snippet you posted.

JSON Online Validator and Formatter - JSON Lint

JSONLint is the free online validator, json formatter, and json beautifier tool for JSON, a lightweight data-interchange format. You can format json, validate json, with a quick and easy copy+paste.

Qwyrdo, Geometer (& β)Jan 3, 2023

@picklish Thanks, and d'oh! You're right. I was mislead by the error specifying where it claimed the problem was. It was, in fact, the next entry; I missed a closing double quote.

There's still the issue of successfully parsing the data, but I don't have the brain for that at the moment (and need to switch gears to another task anyway).

Thanks again!

Pete Bleackley Jan 3, 2023

@Qwyrdo "A","B","C","D" isn't a valid JSON key. However, in a Python dictionary literal, it would be parsed as a tuple, which is a valid dictionary key in Python.

What do your keys and values represent?

Qwyrdo, Geometer (& β)Jan 4, 2023

@PeteBleackley@wandering. Thanks for following up!

I've been really busy with Life Stuff today so my brain is kind of fried at the moment, but hopefully I'll make some sense here.

I think I figured out the key : value data formatting, at least on the JSON side of things. { "'A', 'B', 'C'..." : [list of lists] }

Have yet to get the code you provided to work correctly. The "data = {tuple([item for item in row..." operation tosses back an error about 'dict' being unhashable. Have yet to try further troubleshooting.

As for what the keys and values are: the keys in my data are root word ending matches (e.g. 'A' matches 'ABBA, 'KARMA', etc.). The values are the suffix paradigms that follow, as per previous work on this project. So I'm still trying to build words like ABBAadf, ABBAbdf, ABBAcdf, ABBAaef, etc.

The context for all this is an art project to generate words based on the Voynich Manuscript. Ultimately I'm working with a non-Latin font. Using LibreOffice for that.

Qwyrdo, Geometer (& β)Jan 4, 2023

Going to reframe this a little, as there are a few elements in play that I think are muddying the waters, both for me and those trying to help.

I have a LibreOffice .ods with two columns.

Column 1 has data like

'A', 'B', 'C'

Column 2 has data like

[['a', 'b', 'c'], ['d', 'e']], [['f']], [['g', 'h'], 'ijk'] ...

I want to build a dictionary out of this file.

LibreOffice can export .csv or Excel 2007 .xlsx

I could manually rearrange the data in a .txt file if need be, but cramming it all into one line would be impractical.

In Python, I need column 1 formatted as a tuple so elements in it can be matched with

for Key, Value in dictionary.items():
if Root[-1] not in Key:
continue
(do further processing if the key matches)

For that "further processing" I need nested lists for my code to work, because I'm building strings based on the product of the first level of lists.

(continued as unlisted)

Qwyrdo, Geometer (& β)Jan 4, 2023

I tried using a

with open('file.csv', 'r') as fi

clause, but couldn't get anything but strings out of it.

At someone's suggestion I installed pandas, but it was balking at... I honestly forget, now. I've tried so many ways of data wrangling, all to no avail.

I've tried manually formatting the .csv as a .txt file and opening that with the json module, but I'm not versed in the intricacies of that data paradigm and might be getting it wrong.

In short, I--a novice programmer faced with a surprisingly hard problem--can't figure this out.

Guidance on:

- reformatting my data to play well with some module or feature of Python;

- implementing some module or feature of Python to get the data into the format I need

would be extremely helpful.

I learn best by example, e.g. pseudo-code.

And please keep in mind, at heart I'm an artist trying to incorporate some data processing into my work. Coding is not my forte.

Thank you for reading.

Pete Bleackley Jan 3, 2023

@Qwyrdo I'd use the pandas library for this. If you don't have it already, install it with pip

import pandas
raw = pandas.read_csv(filename, header=None)

data = {tuple([item for item in row
if not isinstance(item,list)]):[item for item in row if isinstance(item,list)]
for (_,row) in raw.iterrows()}

Qwyrdo, Geometer (& β)Jan 3, 2023

@PeteBleackley Thanks again for your help!

It took me a while to figure out how to install pandas (this is all new to me) but eventually I got it and ran the code you so kindly provided for me.

I even think I understand how the code works! It checks each "cell" in each row to see if it's a list; if it's not a list, it puts it in the dictionary key, otherwise it goes into the associated value. (Though I'm not sure abut the (_,row) part.)

There seems to be a bit of a problem, though. Each cell seems to be getting read in as a string, so I'm getting really long strings as the key, and an empty list [] as the value.

Trying to look at the pandas documentation to see if there's a way to change this behavior. So far I'm not finding anything, but the documentation is *enormous*. Trying to find a needle in a haystack, with no magnet!

Qwyrdo, Geometer (& β)Jan 3, 2023

@PeteBleackley Oh, wait. My bad; the CSV is getting read as intended. The double quotes are part of the export from LibreOffice; I'd figure, since there are commas in each part of the data, the individual columns would need delimiters of some kind that *aren't* commas. Otherwise I'd be trying to parse a file that just looked like

'A', 'B', 'C', 'D', [['a', 'b', 'c'] etc. ...

But I presume I'd be running into the problem of "how to separate data from columns" even without the double quotes as delimiters...

Pete Bleackley Jan 3, 2023

@Qwyrdo Argh, it's separating the lists on the commas. Try the following

import json
with open("data.csv",'r') as infile:
raw = [json.loads('[{}]'.format(line)
for line in infile]
data = {tuple([item for item in row
if not isinstance(item, list)]):[item for item in row if isinstance(item,list)]
for row in raw}

Qwyrdo, Geometer (& β)Jan 3, 2023

@PeteBleackley Ugh, I'm so sorry; I seem to be coming up with all the hard problems lately!

I tried the json method you provided. I'm observing the following:

If I export to CSV using double quotes as string delimiters (yielding e.g. "'A', 'B', 'C'", [[ending paradigm here]]) I get the same problem where the code recognizes "'A', 'B', 'C'" as a string--and thus doesn't parse it as a key.

If I export to CV *without* the double quotes, I get a JSONDecodeError Expecting value: line 1 column 2 (char 1)

Starting to thing maybe I'll have to hard code all the requisite key : value pairs by hand, which is technically doable but a pain in the tuchus...

Pete Bleackley Jan 3, 2023

@Qwyrdo It may be that json only recognises double quotes as string delimiters.

Qwyrdo, Geometer (& β)Jan 3, 2023

@PeteBleackley Not sure if that's the case or not.

My instincts tell me the real problem is the fact that I need commas in my data.

If I encapsulate each cell in double quotes upon export, it seems exceedingly hard to get Python to recognize those data as anything but strings.

But if I don't use double quotes to hold each cell together, I end up with a line containing commas that somehow need to be interpreted differently depending on a context that I don't think can be accounted for in code.

I.e. "'A', 'B', 'C'", "[[ending paradigms that include commas]]" gets each chunk parsed as a string that I can't convert, or 'A', 'B', 'C', [[ending paradigms that include commas]] gets parsed with each comma-delimited element as a separate entity, which doesn't serve my data processing needs.

In other words, CSV just might not be the right option, and I need to find another file format to use somehow...

Pete Bleackley Jan 3, 2023

@Qwyrdo I think that json is probably better than csv for what you're trying to do.

Yingtai Jan 3, 2023

@Qwyrdo you are inspiring me to try to learn Python again, even though Microsoft Word is the only program that plays nice with my speech recognition software!