🧐 Behold! The groundbreaking revelation that "fine-grained" scatter-gather is the secret sauce for #processing massive graphs in-memory 🎉. Because obviously, nothing screams 'cutting-edge' like a paper with an abstract longer than the Great Wall of China 🏰.
https://arxiv.org/abs/2503.05116 #finegrained #scattergather #massivegraphs #inmemory #cuttingedge #research #HackerNews #ngated
Piccolo: Large-Scale Graph Processing with Fine-Grained In-Memory Scatter-Gathe

Graph processing requires irregular, fine-grained random access patterns incompatible with contemporary off-chip memory architecture, leading to inefficient data access. This inefficiency makes graph processing an extremely memory-bound application. Because of this, existing graph processing accelerators typically employ a graph tiling-based or processing-in-memory (PIM) approach to relieve the memory bottleneck. In the tiling-based approach, a graph is split into chunks that fit within the on-chip cache to maximize data reuse. In the PIM approach, arithmetic units are placed within memory to perform operations such as reduction or atomic addition. However, both approaches have several limitations, especially when implemented on current memory standards (i.e., DDR). Because the access granularity provided by DDR is much larger than that of the graph vertex property data, much of the bandwidth and cache capacity are wasted. PIM is meant to alleviate such issues, but it is difficult to use in conjunction with the tiling-based approach, resulting in a significant disadvantage. Furthermore, placing arithmetic units inside a memory chip is expensive, thereby supporting multiple types of operation is thought to be impractical. To address the above limitations, we present Piccolo, an end-to-end efficient graph processing accelerator with fine-grained in-memory random scatter-gather. Instead of placing expensive arithmetic units in off-chip memory, Piccolo focuses on reducing the off-chip traffic with non-arithmetic function-in-memory of random scatter-gather. To fully benefit from in-memory scatter-gather, Piccolo redesigns the cache and MHA of the accelerator such that it can enjoy both the advantage of tiling and in-memory operations. Piccolo achieves a maximum speedup of 3.28$\times$ and a geometric mean speedup of 1.62$\times$ across various and extensive benchmarks.

arXiv.org

In our effort to put courses online, we continue lectures on Algorithmic Lower Bound Course. Now you can watch

Lesson 4-11: Algorithmic Lower Bounds by Mohammad Hajiaghayi - NP-Completeness and Beyond

(FEEL FREE TO SUBSCRIBE TO YOUTUBE @hajiaghayi FOR FUTURE LESSONS Premiering on WEDNESDAYS)

https://youtu.be/VZyffnAb1r0 (Lesson 4: 3-Partition Problem & Proving NP-Hardness)

https://youtu.be/4fCD9_1eQw0 (Lesson 5: Puzzle Problem NP-Hardness & 3-Partition)

https://youtu.be/FIyEj72-UJQ (Lesson 6: 3-SAT Problem & Proving NP-Hardness)

https://youtu.be/tbSJzaKx2pA (Lesson 7: Puzzle Problem NP-Hardness via 3-SAT)

https://youtu.be/voRVebBsh94 (Lesson 8: Fine-grained Subcubic Complexity: Part 1)

https://youtu.be/gRURSM6QARo (Lesson 9: Fine-grained Subcubic Complexity: Part 2)

https://youtu.be/qPw82bTAXkc (Lesson 10: Fine-grained Subquadratic Complexity 1)

https://youtu.be/C6j4avVkI7U (Lesson 11: Fine-grained Subquadratic Complexity 2)

#AlorithmicComplexity,

#3SAT,

#3Partition,

#subquadratic,

#subcubic,

#Finegrained,

#HardnessExploration,

#NP,

#PSPACE,

#NPComplete,

#LogSpace,

#ExponentialComplexity,

#ParallelComputation,

#PvsNP,

#NPSPACE,

#NonDeterministicSpace, hashtag

#SavitchTheorem,

#ComplexityClasses,

#Reductions,

#ImportantProblems,

#CommunicationComplexity, hashtag

#GeometricProblems,

#AlgorithmDesign,

#ComputationalComplexity,

#TheoreticalComputerScience,

#AlgorithmicLowerBounds

For comprehensive handwritten lecture notes on this course, visit the instructor's website:

http://www.cs.umd.edu/~hajiagha/
The course textbook "Computational Intractability: A Guide to Algorithmic Lower Bounds" by Demaine, Gasarch, and Hajiaghayi is available for free at:

https://hardness.mit.edu/

Lesson 4: Algorithmic Lower Bounds by Mohammad Hajiaghayi: 3-Partition Problem & Proving NP-Hardness

YouTube