Mastodawn

Carlos Scheidegger Mar 22, 2018

Well, that's one more thing Kosara and I disagree about.

Standard datasets _are_ important. You literally don't want to be surprised by the dataset when you're trying to understand a technique.

Synthetic examples serve a similar purpose: you _know_ what's in them, so you can check that your technique is doing what you expect.

If you design a new technique and show it on a new dataset, readers can't separate one from the other.

Show thread

Robert Gove Mar 22, 2018

@scheidegger I agree standard datasets are useful for those reasons. But I'm genuinely skeptical of techniques demonstrated on synthetic datasets. There are too many cases where techniques have wildly different results on synthetic and real datasets because the synthetic datasets are too abstract. E.g. most graphs are not nearly-uniform meshes, and Watts-Strogatz/Barabasi-Albert graphs have substantial topological differences from real social nets, but these are standard graph layout datasets.

Show thread

Carlos Scheidegger Mar 22, 2018

@rpgove well yeah, you don't only show things on synthetic datasets. But you also don't only show things on "real" datasets.

Show thread

Robert Gove

@scheidegger But people *do* evaluate things only on synthetic data, and then we all treat it as authoritative, e.g. readability of node-link vs. matrix-based diagrams http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.180.5768&rep=rep1&type=pdf But I take your point that both standard and synthetic data can be useful and serve a purpose. I just object when synthetic is the only means of evaluation, which happens too often.