Mastodawn

Stephen Nixon Oct 2, 2023

“To train the model, I assembled a dataset of 71k distinct fonts.”

Hmm, I wonder if those fonts were all open-licensed...

https://serce.me/posts/02-10-2023-hey-computer-make-me-a-font

Hey, Computer, Make Me a Font

This is a story of my journey learning to build generative ML models from scratch and teaching a computer to create fonts in the process.

Hey, Computer, Make Me a Font

Show thread

Jason Santa Maria Oct 2, 2023

@arrowtype My immediate thought too. And I fear things about both answers 😅

Show thread

Stephen Nixon Oct 2, 2023

@jasonsantamaria ha, yes, same.

Show thread

Kris Sowersby Oct 2, 2023

@arrowtype @jasonsantamaria https://github.com/SerCeMan/fontogen/issues/1#issuecomment-1743770915

Publish dataset? · Issue #1 · SerCeMan/fontogen

Hello, is there info about the dataset used for training the model? It's pretty important for some use cases (like publishing games on steam) to be sure that all training data is licensed appropria...

GitHub

Show thread

Typographica Oct 4, 2023

@klim @arrowtype @jasonsantamaria “I suggest we avoid discussing this specific conversation branch any further”. good grief.

Show thread

Stephen Nixon Oct 4, 2023

@typographica @klim @jasonsantamaria “the discussions can easily become heated without leading to a productive outcome.”

i.e. “the discussions aren’t going to help with large-scale IP theft and laundering”

Show thread

Simon Cozens Oct 4, 2023

@arrowtype @typographica @klim @jasonsantamaria Oh good, I had a feeling *I* was the asshole. I mean, I still could be, but at least I'm not alone.

Show thread

Stephen Nixon Oct 4, 2023

@simoncozens @typographica @klim @jasonsantamaria no, definitely, thank you for being articulate in defense of being thoughtful at the outset of such a project.

Show thread

Frere-Jones Type Oct 4, 2023

@arrowtype @simoncozens @typographica @klim @jasonsantamaria Yes thanks for speaking up. We just posted a response to the GitHub discussion also. https://github.com/SerCeMan/fontogen/issues/1#issuecomment-1747486006

Publish dataset? · Issue #1 · SerCeMan/fontogen

GitHub

Show thread

Stephen Nixon Oct 4, 2023

@frerejonestype 🔥 Thank you for contributing to the discussion! I hadn’t even realized quite how thoroughly pirated the data was.

Show thread

Frere-Jones Type Oct 4, 2023

@arrowtype We figured there would be some heat from this, but from our perspective, it’s really important that as an industry we’re clear with developers what the legal boundaries are with using our copyrighted material in data sets, just as other creative industries are trying to do. Particularly with people who throw any consideration of this to the wind.

Show thread

Stephen Nixon

@frerejonestype yes, 100%. The best time to defend against AI infringement is before it gets deployed. As I understand it, once a GPT model gets trained, it is impossible to retroactively remove input data.

Show thread

Kris Sowersby Oct 4, 2023

@arrowtype @frerejonestype One stolen font is piracy, thousands are a dataset.

Show thread

Jany Belluz Oct 5, 2023

@klim @arrowtype @frerejonestype nicely put! It seems to be the LLM motto, and it works for pictures and books too.

Show thread

Mario Breskic Oct 5, 2023

@klim @arrowtype @frerejonestype Stealing from one person is theft, stealing from everyone is the Third Reich