244 Followers
382 Following
234 Posts
Data discovery software developer at JMP Discovery, LLC. Focused on data visualization and exploration. Prefer smoothers over fitted lines. Views my own.
Creator of Graph Builder UI within JMP.
Creator of Packed Bars chart type for high-cardinality Pareto data.
#DataViz #DataScience #TieDye #LessIsMore
Bloghttps://rawdatastudies.com
Packed Barshttps://packedbars.com
Blueskyhttps://bsky.app/profile/xangregg.bsky.social
[overlapping agitated chatter]

RE: https://rawdatastudies.com/2026/01/19/seeing-uniformity/

New blog post checking two years of authentication codes against a uniform distribution with #dataviz.

My venture into quintile area strip plots ended up giving me greater appreciation for the quiet strengths of box plots. #dataviz New blog post: https://rawdatastudies.com/2026/01/02/data-strips-quintiles-vs-box-plots/
Data Strips: Quintiles vs. Box Plots – Raw Data Studies

Experiments with a new “quintile area” strip plot, prompted by skewed box plots in a biology paper, ended up clarifying why box plots remain so robust.

RE: https://rawdatastudies.com/2025/12/29/greenland-ice-melt-history/

Anyone remember the alarming Economist chart from 2019 that accidentally taught us all a lesson on individual observations vs group averages? My deeper #dataviz dive with latest data:

New blog post: It started with me trying to understand some radar charts, but getting there required a side-quest of fitting a panel of sigmoidal curves. https://rawdatastudies.com/2025/12/15/from-radar-charts-to-curve-fitting-and-back/

Apologies, but my conference trip report was delayed because it started out as a list of papers I liked and ended up being a 3.5k word rant that stops a hair's breadth away from calling for propaganda of the deed.

Anywhere, here's some words about IEEE VIS 2025:

https://mcorrell.medium.com/nestbeschmutzer-a-quasi-trip-report-from-ieee-vis-2025-265866b0b292

Nestbeschmutzer: A Quasi-Trip Report from IEEE VIS 2025

I recently returned from IEEE VIS, held this year in Vienna, Austria. Somewhat recently, I should say. I’m slow enough to write these kinds…

Medium
Slightly worried that sharing this morning will reveal that I spent (another) Saturday making charts and writing a blog post. #dataviz https://rawdatastudies.com/2025/11/22/beeswarm-attack/

NCAA football team draft rates

When I see a straight-line fit, I always wonder if it falls into the ask-for-a-line category. In particular, if the trend is really significant and really linear.

It always helps if the raw data is available and the original Reddit post does eventually point to the data [CSV] though I had to go through a chatbot to see it. The raw data nicely has 44,000+ players graduating high school between 2005 and 2025. For each player there are several fields, including a composite rating, what college they played for, and whether they were drafted by the NFL or not. The general idea of the graph is to show how draft likelihood depends on the high school rating and school. Presumably schools that are far above or below the overall trend line do better or worse at preparing players to be drafted.

Player level trends

Since the raw data is at the player level instead of the team level, I’ll start there to provide a sense of the data. This chart is constrained to teams in the SEC to keep the number of dots manageable for one chart (5000 out of 44,000 here).

The composite ratings are on the x axis and the coloring is the star-system rating. I’m surprised the star rating and the composite ratings align so well. I was under the impression that the composite ratings were relative within year and the stars were more absolute, but I could be wrong or maybe there isn’t that much year-to-year variability in player quality.

In any case, we learn that the ratings mostly range from 0.7 to 1.0, and we see spikes at regular intervals at the low-end. I think that means those ratings are coarser; for instance, perhaps each year’s top 100 players are ranked precisely and others were assigned to tiers. We can make some rough assessment of the draft rates already, and we can do better if we smooth the dot jitter a bit.

Now it’s easier to sense that the proportion drafted is correlated with the rating. For instance, over half of the black dots (5-star ratings) are in the top, drafted=true, section, but far less than half for the other groups.

For all my charts (except the next one), I only include players that graduated high school in 2021 or before, though the data set goes through 2025. The original chart goes through 2022, which does capture some players who left school after only 3 years but also counts all 4-year players as not-drafted. (The raw data set doesn’t indicate whether players are still in school or not.) To confirm that, here’s a plot of the proportion of 4-star players drafted by year of high school graduation.

Even the 2021, rate may be lessened by players still in school, but certainly 2022 is affected, so I think it’s right to exclude 2022.

Even with the raw, unsummarized data, we can remap the drafted state to a 0 or 1 numeric value and fit a trend curve against rating for all 44,000 players.

This suggests that, at least at the player level, the trend is not a straight line, even within the original 0.8 to 1.0 range. The slope changes at around 0.85 and then more sharply just past 0.95.

Team level comparisons

Here’s a view of drafted proportion versus star ratings by team. Teams are ordered by descending overall draft percentage, with only the top 20 shown.

The black lines represent the overall averages (and so they’re at the same positions in each panel). Some of the 5-star proportions have very wide confidence intervals since the counts are very small. For instance, South Carolina only had four 5-star players during this period, but all four were drafted, it has 100% draft rate with a wide confidence interval.

Finally, here’s a close reproduction of the original but using a spline smoother instead of a straight line fit. Instead of white, I used the school’s secondary color for the text, but admittedly it doesn’t always work well (gray OSU text over scarlet circle). I also added Notre Dame, which was inadvertently left off of the original.

At this aggregation level, the curve and the line are not that different, so my only complaint about a straight line fit is the way it suggests going below 0 for values less than 0.8. I also tried school logos as marks.

The school colors and logos weren’t in the original data set, but I was able to employ ChatGPT to collect them for me, which worked well enough for the larger schools.

My new blog post exploring data from a paper comparing step count changes of people who move to cities with different walk scores. #dataviz https://rawdatastudies.com/2025/08/23/step-count-versus-city-walkability/
Need to get the raw data behind a chart? Here's a walk-through of my PDF → SVG → CSV → Data techinque. #dataviz
https://rawdatastudies.com/2025/07/21/data-extraction-challenge/
Data extraction challenge – Raw Data Studies

Throughout my quests for raw data, I've learned a few techniques for find data lurking behind the charts. This walk-through shows a few of them,