Mastodawn

We have an IEEE CAL 2026 paper on a neat little area-efficient integer dot-product hardware unit.

📄Paper: https://www.eecg.utoronto.ca/~mcj/papers/2026.fased.cal.pdf
⚙️Code: https://github.com/mcj-group/fased-verilog

The goal is to efficiently support quantized AI models with variable bit widths across layers (e.g., LLMs with 2-, 4-, or 8-bit weights targeting edge devices or even servers). Our key insight is to optimize the dot product holistically, rather than only its subcomponents. Our proposed FASED builds on the design of a Booth multiplier and eliminates nearly half of all full-adders in the dot product unit by fusing the multiplication and reduction steps. FASED reduces area by up to 1.9x over prior variable-width integer dot product designs. This was a fun collaboration with Pavel Golikov, Karthik Ganesan, and Gennady Pekhimenko.

Mark Jeffrey Jun 6, 2025

We have an upcoming paper at ICML 2025.

📄Paper: https://www.eecg.utoronto.ca/~mcj/papers/2025.alspec.icml.pdf
⚙️Code: https://github.com/mcj-group/alspec

This gist: When a large-language model (LLM) reacts to a prompt to generate text (inference), one of the most important but slowest stages of the computation is called attention. One way to speed up attention is to use more compute devices (e.g., multiple GPUs or accelerators) to work collaboratively on it (tensor parallelism). Unfortunately, when scaling to 8 or more devices, the communication between them overwhelms the benefit of their increased computational ability. Another way to speed up attention is to use an approximation of its underlying math. Prior approaches to approximate attention can work well for particular inference tasks, but in others, like solving math problems, the quality of the generated text is poor. We propose attention-level speculation, a technique that combines and enhances multi-device and approximate approaches to speed up LLM inference. Attention-level speculation sometimes uses the output of the attention approximation, but sometimes does not, verifying on the fly whether the approximation was of good quality. Using two devices, we overlap the verification of approximation quality with speculative downstream computation. Speculation succeeds for up to 90% of attention operations. Our experiments with Tenstorrent N150s suggest that using attention-level speculation in combination with tensor parallelism across 8 devices is up to 1.65x times faster than using tensor parallelism alone on 8 devices.

This project was led by Jack Cai, a recent BASc grad from University of Toronto Engineering and a former Tenstorrent intern. Jack pitched this for an undergraduate thesis project and I did not think it would work. Shame on me and amazing work and persistence by Jack. Many thanks to our co-authors Ammar Vora, Randalph Zhang, and Mark O'Connor.

Mark Jeffrey Mar 26, 2025

Julia Evans Mar 26, 2025

slowly working on a mega terminal cheat sheet

here's a link to the draft as a PDF: https://jvns.ca/terminal-cheat-sheet-draft.pdf

Mark Jeffrey Dec 6, 2024

We have an FPT paper being presented next week.
📄Paper: https://www.eecg.utoronto.ca/~mcj/papers/2024.mqrouter.fpt.pdf
⚙️Code: https://github.com/verilog-to-routing/vtr-verilog-to-routing/tree/mq-parallel-router

The gist: #FPGA routing can take many hours, one of the longest stages in FPGA #CAD flows, impeding productivity for hardware designers. Meanwhile, the parallel algorithms community has demonstrated significant algorithm scalability for path search problems like SSSP using relaxed concurrent priority schedulers. While maintaining the deterministic circuit routing demanded of FPGA CAD users, we applied techniques from the parallelism community to the A*, (heuristic) directed, and Dijkstra path searches in FPGA routing. Atop VTR8, our work gets 13x, 2x, and 18x speedup, respectively, on average.

This was a super fun collaboration with Mirjana Stojilović, Vaughn Betz and his students Alexandre Singer and Hang Yan, and MASc grad from my group, Guozheng Zhang. This paper blossomed out of a lunch conversation then course project; dream scenario for graduate course instructors.

Mark Jeffrey Jun 19, 2024

Proud advisor moment 2: Javad Abdi convocated in Toronto as I presented his work at #spaa2024 (his choice!) Javad's work interrogates the #Rust programming language claim of fearless concurrency. Through a case study, we find that #Rust programmers indeed need not be fearful when expressing easy parallelism, but when parallelism gets hard (e.g., irregular run-time varying data dependences) #Rust is not inherently easier to use than its predecessors like C++.
Paper: https://www.eecg.utoronto.ca/~mcj/papers/2024.rpb.spaa.pdf

Mark Jeffrey Jun 18, 2024

Proud advisor moment: Guozheng Zhang presented his master’s work at #spaa2024. Ordered algorithms are difficult to scale on manycores, but several priority schedulers have been proposed. Guozheng introduces a taxonomy to evaluate past schedulers and explores a new design point: the Multi Bucket Queue. See more in our paper https://www.eecg.utoronto.ca/~mcj/papers/2024.mbq.spaa.pdf

Mark Jeffrey Jun 11, 2024

Manic Itzy Dream Girl

Mar 24, 2023

In 1968, at 30 years old, Lynn Conway transitioned. In doing so she lost her wife, her children, and her job at IBM. She continued on, living authentically as her true self and continued her career as an electral engineer. In 1978 she became an associate professor at MIT and taught a course in VLSI (very large scale integration) that became the basis of the Mead-Conway VLSI Design Methodology, changing how we design integrated circuits. In 1985 she became a professor of electrical engineering at the University of Michigan, and later the associate dean of engineering.

32 years after transitioning, Conway came out publicly as a transgender woman. Since than she has been a advocate for transgender people in the tech sector. In 2020 IBM formally apologized for firing Conway for being trans, over 50 years after the fact. Just a bit of #transHistory I learned today.

#history

Mark Jeffrey Jan 8, 2024

Arne Brasseur Dec 31, 2023

Every town should have a cable depot where everyone brings the cables they aren't using and they sort, test, and store them, and when you need some random cable you simply drop by the cable depot to pick it up. And if they don't have it they'll request it from the next town over, like inter library loans.

Mark Jeffrey Dec 7, 2023

Adrian Sampson Dec 6, 2023

I blogged about grad school statements of purpose, just in time for it to be mostly useless for its intended audience this cycle. https://www.cs.cornell.edu/~asampson/blog/gradstatement.html

Critiquing a PhD Application Statement

I offer some feedback on a thoroughly mid statement of purpose for PhD applications from fifteen years ago.

Mark Jeffrey Nov 16, 2023

The ECE department at the University of Toronto is recruiting tenure-stream assistant professors in computer engineering, quantum technologies, and energy. The faculty are outstanding. The city is amazing. The sunsets are pretty. Even the ice is pretty. Come join us! https://www.ece.utoronto.ca/faculty/career-opportunities/

Career Opportunities - Electrical & Computer Engineering

Faculty opportunities Assistant Professor – Energy Assistant Professor – Computer Engineering Assistant Professor – Quantum Technologies Research opportunities The Edward S. Rogers Sr. Department of Electrical & Computer Engineering fills […]

Electrical & Computer Engineering

Pronouns	he/him
Affiliation	University of Toronto
Web	https://www.eecg.utoronto.ca/~mcj