Live Emulator Output
Input A[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
Input B[0.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0]
Output[0.0, 11.0, 22.0, 33.0, 44.0, 55.0, 66.0, 77.0]
✓ Vector addition computed on PumpGPU emulator
Without a clear spec, we'll waste months on rework. The spec is the contract between all components: assembler, emulator, and eventually RTL must agree on every bit.
Deliverables
- SPEC.md v0 with ISA encoding table
- Memory model documentation (global, shared, registers)
- Kernel launch ABI defined
- Repository CI green
- Emulator boots and executes NOP
Exit Criteria
- SPEC.md v0 committed and reviewed
- Emulator compiles and runs empty kernel
- CI pipeline passes
- At least one stream episode completed
Risks & Mitigations
Over-engineering ISAStart with minimal ops, add later
SIMD vs SIMT paralysisCommit to SIMD v0, document SIMT path
Scope creepTimebox to 2 weeks max
The emulator is the golden reference. Every future component (RTL, optimized emulator, debugger) must produce identical results. Getting this right is non-negotiable.
Deliverables
- All arithmetic ops implemented (int32, float32)
- Load/store to global memory
- Parameter passing via launch ABI
- Lane ID / Global ID intrinsics
- Predicated execution
- Basic barrier (workgroup sync)
- Atomic add (int32)
- vecadd kernel passes
- reduce kernel passes
Exit Criteria
- vecadd kernel: emulator matches Python reference
- reduce kernel: emulator matches Python reference
- All ISA ops have unit tests
- Memory model documented with examples
- Performance counter skeleton exists
Risks & Mitigations
Memory model ambiguityDocument every edge case as we find it
Float precision issuesUse IEEE 754 strictly, document rounding
Barrier semanticsCopy CUDA barrier semantics initially
Happy path onlyProperty-based tests for edge cases
Good tooling accelerates everything. A flaky assembler means hours of "is this a bug or a tooling issue?" debugging. Disassembler is essential for verifying RTL later.
Deliverables
- Assembler handles all ISA ops
- Assembler has good error messages with line numbers
- Disassembler round-trips cleanly
- Emulator tracks instruction count, memory ops
- matmul (tiled) kernel runs correctly
- At least 3 example kernels documented
Exit Criteria
- matmul kernel passes tests
- Assembler fuzz-tested (no crashes on garbage input)
- Disassembler output reassembles to identical binary
- Performance counters report instruction mix
Risks & Mitigations
Grammar ambiguitiesUse proper parser (pest, nom) not regex
Binary format changesVersion header in binary format
DSL scope creepKeep optional, assembler is primary
RTL without a microarchitecture plan is like coding without design docs-possible but painful. This phase prevents "architecture astronaut" problems in Phase 4.
Deliverables
- Execution unit design (how many ALUs, what ops)
- Lane width decision (4, 8, 16, 32?)
- Register file design (ports, banking)
- Scheduler design (in-order v0, scoreboard v1)
- Memory coalescer rules documented
- Command processor / queue design
- DMA engine interface
- Interface timing diagrams
Exit Criteria
- ARCHITECTURE.md has complete block diagram
- All module interfaces documented
- Resource estimate for target FPGA
- Scheduling algorithm documented
- Memory coalescing rules with examples
Risks & Mitigations
Over-designingKeep v0 simple, document "v1 ideas" separately
Ignoring FPGA limitsCheck target FPGA resources early
Mismatched interfacesDefine contracts before modules
This is where PumpGPU becomes a real (soft) GPU. All previous phases were building toward this moment.
Deliverables
- Fetch/decode unit
- Scalar ALU
- Vector ALU (SIMD lanes)
- Register file
- Load/store unit
- Scratchpad (shared memory)
- Command processor
- DMA engine
- UART interface for debugging
- Ethernet or PCIe interface (stretch)
- "Hello kernel" executes on FPGA
Target FPGA Classes
EntryArty A7, DE10-Lite - Limited resources, good for core
MidNexys Video, DE10-Nano - Enough for full v0
HighKCU105, VCU118 - PCIe, HBM possible
Exit Criteria
- vecadd runs on FPGA, matches emulator
- reduce runs on FPGA, matches emulator
- RTL passes all emulator test vectors
- Resource utilization documented
- Clock frequency achieved documented
Risks & Mitigations
Timing closureStart with low clock, optimize later
RTL-emulator mismatchCo-simulation from day 1
Resource exhaustionCheck utilization every module
Debugging hellExtensive ILA triggers, UART logging
v0 will be slow. Phase 5 is about understanding why and fixing the bottlenecks systematically.
Deliverables
- Bank conflict counter in emulator
- Coalescing efficiency metric
- Scheduler improvements (reduce stalls)
- Profile-guided optimization docs
- Stable kernel suite (10+ kernels)
- Performance comparison: emulator vs FPGA vs CPU
Exit Criteria
- Measurable perf improvement on matmul
- Profiling tools documented
- At least 10 kernels in test suite
- Performance numbers published
Graphics is what makes GPUs "GPUs" in public perception. Even a simple triangle demo is hugely impactful for engagement.
Deliverables
- Triangle rasterization in emulator
- Framebuffer memory region
- Vertex shader subset (transform, project)
- Fragment shader subset (color, texture sample)
- HDMI output from FPGA
- Spinning cube demo
Exit Criteria
- Draws colored triangles
- Runs on FPGA with display output
- Mini-shader executes
The ultimate goal of "building a GPU" is custom silicon. Even if we don't tape out, understanding the path is valuable.
Deliverables
- OpenROAD flow documented
- Synthesis results for core modules
- Area/power estimates
- MPW shuttle options researched (Google/eFabless, TinyTapeout)
- Tapeout readiness checklist
- Cost/timeline estimate
Exit Criteria
- Synthesis completes for core
- Feasibility document published
- Go/no-go decision documented
Risks & Mitigations
Tool complexityStart with OpenROAD tutorials
Cost prohibitiveDocument shuttle options, crowdfunding
Design rule violationsUse proven PDKs (SKY130, GF180)
Pump.fun Build in Public
Why PumpGPU Will Win
Building a GPU from scratch, live, in 25 days.
25
Days
00
Hours
00
Minutes
00
Seconds
Why PumpGPU Deserves to Win
PumpGPU isn't just another software project - it's an audacious attempt to build actual silicon from scratch, completely in public.
No one has ever live-streamed building a GPU from specification to silicon. This isn't a tutorial or a clone - it's original hardware design, documented from day zero. We're creating ISA specifications, writing assemblers, building emulators, and designing RTL that will run on real FPGAs.
Every commit, every design decision, every bug fix happens live. Viewers see the raw process of hardware engineering - the debugging sessions, the "aha" moments, the architectural pivots. This is transparency at its most extreme. No polished demos, just real engineering.
GPU architecture is gatekept knowledge - mostly locked inside NVIDIA and AMD. PumpGPU breaks that barrier. Every episode teaches concepts that cost $50K+ in university courses: ISA design, memory coalescing, SIMD execution, RTL synthesis. We're creating the GPU course that doesn't exist.
MIT licensed from day one. Every line of code, every documentation file, every design document is public. The community can fork, learn, contribute, and build upon PumpGPU. We're not just building a GPU - we're building a foundation for open hardware.
"Solo dev builds GPU from scratch on stream" is a headline that writes itself. It's the kind of underdog story that resonates - combining extreme technical depth with accessible streaming content. Perfect for viral growth and community engagement.
We have a detailed roadmap with concrete deliverables: working emulator (done), assembler, disassembler, RTL modules, FPGA demo. Each stream advances toward a visible goal. Progress is measurable, verifiable, and undeniably real.
Perfect Fit for Pump.fun Hackathon
Build in Public
✓ Every line of code written live on stream
Community Engagement
✓ Real-time feedback shapes design decisions
Live Streaming
✓ 5+ streams per week minimum
Transparent Progress
✓ Public GitHub, public roadmap, public demos
Broad User Interest
✓ Hardware, AI, gaming, education communities
Scalable Potential
✓ Educational platform, hardware products, consulting
Founder Discipline
✓ Detailed specs, structured roadmap, consistent delivery
Organic Demand
✓ Technical content that generates genuine interest