PumpGPU

Building a GPU from Scratch

An open-source journey from specification to silicon streamed live.

Live Emulator Output

Input A[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]

Input B[0.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0]

Output[0.0, 11.0, 22.0, 33.0, 44.0, 55.0, 66.0, 77.0]

✓ Vector addition computed on PumpGPU emulator

Development Roadmap

From software emulation to hardware implementation

25 Days Sprint

Phase 0 Vision + Definitions

3 days Complete

Without a clear spec, we'll waste months on rework. The spec is the contract between all components: assembler, emulator, and eventually RTL must agree on every bit.

Deliverables

SPEC.md v0 with ISA encoding table
Memory model documentation (global, shared, registers)
Kernel launch ABI defined
Repository CI green
Emulator boots and executes NOP

Exit Criteria

SPEC.md v0 committed and reviewed
Emulator compiles and runs empty kernel
CI pipeline passes
At least one stream episode completed

Risks & Mitigations

Over-engineering ISAStart with minimal ops, add later

SIMD vs SIMT paralysisCommit to SIMD v0, document SIMT path

Scope creepTimebox to 2 weeks max

Phase 1 Software Golden Model (Emulator)

5 days In Progress

The emulator is the golden reference. Every future component (RTL, optimized emulator, debugger) must produce identical results. Getting this right is non-negotiable.

Deliverables

All arithmetic ops implemented (int32, float32)
Load/store to global memory
Parameter passing via launch ABI
Lane ID / Global ID intrinsics
Predicated execution
Basic barrier (workgroup sync)
Atomic add (int32)
vecadd kernel passes
reduce kernel passes

Exit Criteria

vecadd kernel: emulator matches Python reference
reduce kernel: emulator matches Python reference
All ISA ops have unit tests
Memory model documented with examples
Performance counter skeleton exists

Risks & Mitigations

Memory model ambiguityDocument every edge case as we find it

Float precision issuesUse IEEE 754 strictly, document rounding

Barrier semanticsCopy CUDA barrier semantics initially

Happy path onlyProperty-based tests for edge cases

Phase 2 Tooling

4 days Upcoming

Good tooling accelerates everything. A flaky assembler means hours of "is this a bug or a tooling issue?" debugging. Disassembler is essential for verifying RTL later.

Deliverables

Assembler handles all ISA ops
Assembler has good error messages with line numbers
Disassembler round-trips cleanly
Emulator tracks instruction count, memory ops
matmul (tiled) kernel runs correctly
At least 3 example kernels documented

Exit Criteria

matmul kernel passes tests
Assembler fuzz-tested (no crashes on garbage input)
Disassembler output reassembles to identical binary
Performance counters report instruction mix

Risks & Mitigations

Grammar ambiguitiesUse proper parser (pest, nom) not regex

Binary format changesVersion header in binary format

DSL scope creepKeep optional, assembler is primary

Phase 3 Microarchitecture Plan

3 days Upcoming

RTL without a microarchitecture plan is like coding without design docs-possible but painful. This phase prevents "architecture astronaut" problems in Phase 4.

Deliverables

Execution unit design (how many ALUs, what ops)
Lane width decision (4, 8, 16, 32?)
Register file design (ports, banking)
Scheduler design (in-order v0, scoreboard v1)
Memory coalescer rules documented
Command processor / queue design
DMA engine interface
Interface timing diagrams

Exit Criteria

ARCHITECTURE.md has complete block diagram
All module interfaces documented
Resource estimate for target FPGA
Scheduling algorithm documented
Memory coalescing rules with examples

Risks & Mitigations

Over-designingKeep v0 simple, document "v1 ideas" separately

Ignoring FPGA limitsCheck target FPGA resources early

Mismatched interfacesDefine contracts before modules

Phase 4 RTL on FPGA

8 days Upcoming High Risk

This is where PumpGPU becomes a real (soft) GPU. All previous phases were building toward this moment.

Deliverables

Fetch/decode unit
Scalar ALU
Vector ALU (SIMD lanes)
Register file
Load/store unit
Scratchpad (shared memory)
Command processor
DMA engine
UART interface for debugging
Ethernet or PCIe interface (stretch)
"Hello kernel" executes on FPGA

Target FPGA Classes

EntryArty A7, DE10-Lite - Limited resources, good for core

MidNexys Video, DE10-Nano - Enough for full v0

HighKCU105, VCU118 - PCIe, HBM possible

Exit Criteria

vecadd runs on FPGA, matches emulator
reduce runs on FPGA, matches emulator
RTL passes all emulator test vectors
Resource utilization documented
Clock frequency achieved documented

Risks & Mitigations

Timing closureStart with low clock, optimize later

RTL-emulator mismatchCo-simulation from day 1

Resource exhaustionCheck utilization every module

Debugging hellExtensive ILA triggers, UART logging

Phase 5 Optimization & Advanced Features

2 days Upcoming

v0 will be slow. Phase 5 is about understanding why and fixing the bottlenecks systematically.

Deliverables

Bank conflict counter in emulator
Coalescing efficiency metric
Scheduler improvements (reduce stalls)
Profile-guided optimization docs
Stable kernel suite (10+ kernels)
Performance comparison: emulator vs FPGA vs CPU

Exit Criteria

Measurable perf improvement on matmul
Profiling tools documented
At least 10 kernels in test suite
Performance numbers published

Phase 6 Optional: Graphics Pipeline

Stretch Post-Hackathon

Graphics is what makes GPUs "GPUs" in public perception. Even a simple triangle demo is hugely impactful for engagement.

Deliverables

Triangle rasterization in emulator
Framebuffer memory region
Vertex shader subset (transform, project)
Fragment shader subset (color, texture sample)
HDMI output from FPGA
Spinning cube demo

Exit Criteria

Draws colored triangles
Runs on FPGA with display output
Mini-shader executes

Phase 7 Optional: ASIC Path

Stretch Post-Hackathon High Risk

The ultimate goal of "building a GPU" is custom silicon. Even if we don't tape out, understanding the path is valuable.

Deliverables

OpenROAD flow documented
Synthesis results for core modules
Area/power estimates
MPW shuttle options researched (Google/eFabless, TinyTapeout)
Tapeout readiness checklist
Cost/timeline estimate

Exit Criteria

Synthesis completes for core
Feasibility document published
Go/no-go decision documented

Risks & Mitigations

Tool complexityStart with OpenROAD tutorials

Cost prohibitiveDocument shuttle options, crowdfunding

Design rule violationsUse proven PDKs (SKY130, GF180)

Pump.fun Build in Public

Why PumpGPU Will Win

Building a GPU from scratch, live, in 25 days.

25 Days

00 Hours

00 Minutes

00 Seconds

Why PumpGPU Deserves to Win

PumpGPU isn't just another software project - it's an audacious attempt to build actual silicon from scratch, completely in public.

⚡

Unprecedented Technical Ambition

No one has ever live-streamed building a GPU from specification to silicon. This isn't a tutorial or a clone - it's original hardware design, documented from day zero. We're creating ISA specifications, writing assemblers, building emulators, and designing RTL that will run on real FPGAs.

📺

Ultimate Build in Public

Every commit, every design decision, every bug fix happens live. Viewers see the raw process of hardware engineering - the debugging sessions, the "aha" moments, the architectural pivots. This is transparency at its most extreme. No polished demos, just real engineering.

🎓

Massive Educational Value

GPU architecture is gatekept knowledge - mostly locked inside NVIDIA and AMD. PumpGPU breaks that barrier. Every episode teaches concepts that cost $50K+ in university courses: ISA design, memory coalescing, SIMD execution, RTL synthesis. We're creating the GPU course that doesn't exist.

🌍

Open Source Everything

MIT licensed from day one. Every line of code, every documentation file, every design document is public. The community can fork, learn, contribute, and build upon PumpGPU. We're not just building a GPU - we're building a foundation for open hardware.

🔥

Narrative That Captures Attention

"Solo dev builds GPU from scratch on stream" is a headline that writes itself. It's the kind of underdog story that resonates - combining extreme technical depth with accessible streaming content. Perfect for viral growth and community engagement.

🎯

Clear Milestones, Real Progress

We have a detailed roadmap with concrete deliverables: working emulator (done), assembler, disassembler, RTL modules, FPGA demo. Each stream advances toward a visible goal. Progress is measurable, verifiable, and undeniably real.

Perfect Fit for Pump.fun Hackathon

Build in Public

✓ Every line of code written live on stream

Community Engagement

✓ Real-time feedback shapes design decisions

Live Streaming

✓ 5+ streams per week minimum

Transparent Progress

✓ Public GitHub, public roadmap, public demos

Broad User Interest

✓ Hardware, AI, gaming, education communities

Scalable Potential

✓ Educational platform, hardware products, consulting

Founder Discipline

✓ Detailed specs, structured roadmap, consistent delivery

Organic Demand

✓ Technical content that generates genuine interest