What #linux #software can I use to do Correspondence Analysis of 2D data matrices?

#statistics

@mrundkvist The first two that spring to mind are https://octave.org/ & https://posit.co/download/rstudio-desktop/

But it might be simpler to use python with numpy and pandas.

Let me know if you need a hand.

GNU Octave

GNU Octave is a programming language for scientific computing.

@greenboxcode Thank you! I can't code and I don't understand the underlying math. I am used to having a black box (under Windows) into which I toss a spreadsheet, and then a scattergram pops out of the other side of the box. What happens inside the box is completely opaque to me.
@mrundkvist @greenboxcode Which black box?

@noctuaminervae @greenboxcode

The Bonn Archaeological Statistics Package.

@mrundkvist @greenboxcode Ooh, that package looks like a subject for archaeological study itself. 🙂 If it’s what you’re comfortable with, you *could* try to get it to run under Wine, perhaps.

The PSPP manual lists correspondence analysis under “parts of the pspp language that are not yet implemented”, which sounds … not great.

I find R repays study, though (as no doubt do Octave or Pandas/Numpy), and what you want can probably be done in a few lines of code with the right packages installed.

@mrundkvist I see. Libre Office - Calc might be a good alternative to MS Excel. For reference background on Correspondence Analysis can be see at https://en.wikipedia.org/wiki/Correspondence_analysis

How big is your data set?

Correspondence analysis - Wikipedia

@greenboxcode Maybe 300 by 50? Ones and zeroes.

@mrundkvist I would worry about any generic 'black box' solution might simplify the clustering, and weighting. I think a custom solution might be best. https://maxhalford.github.io/prince/mca/ could be used with python to help you.

Don't be too quick to shy away from learning to code a bit. Python is accessible to a lot of people. Good luck

Multiple correspondence analysis

Resources Computation of Multiple Correspondence Analysis, with code in R Data Multiple correspondence analysis is an extension of correspondence analysis. It should be used when you have more than two categorical variables. The idea is to one-hot encode a dataset, before applying correspondence analysis to it. As an example, we’re going to use the balloons dataset taken from the UCI datasets website. import pandas as pd dataset = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/balloons/adult+stretch.data') dataset.columns = ['Color', 'Size', 'Action', 'Age', 'Inflated'] dataset.

Prince
@greenboxcode Might GNU PSPP be a useful tool?