Lions

Reimagine how you understand codebases

Code, intelligently annotated.

Browse Codebases

transformer.py

42def forward(self, x):

43residual = x

44x = self.attention(x)insight

Multi-head self-attention: each token attends to all others, producing context-aware representations.

45x = self.norm1(x + residual)decision

46residual = x

47x = self.ffn(x)

48x = self.norm2(x + residual)

49return xinvariant

Indexed Codebases

Explore structural reviews and intelligent guides for popular repositories.

Gist: 8627fe00...

This repository contains a single-file, self-contained implementation of a GPT-style transformer language model called MicroGPT. It covers the full pipeline from tokenization through model architecture, training loop, and text generation inference, all within one compact Python file. Key design decisions include minimalism and educational clarity — the implementation deliberately avoids external deep learning framework abstractions beyond the core tensor operations, making every component visible and understandable in one place. This project is ideal for developers, students, and researchers who want to understand how GPT-style models work at a fundamental level without navigating a large codebase, or who need a lightweight starting point for experimentation with transformer architectures.

1 file

Fully annotated

Start Reading

karpathy/micrograd

Micrograd is a minimalist scalar-valued automatic differentiation engine accompanied by a small neural network library, created by Andrej Karpathy as an educational project. The core of the library is the Value class in engine.py, which wraps scalar numbers, overloads Python arithmetic operators, and builds a dynamic computation graph during the forward pass. Reverse-mode autodiff (backpropagation) is then performed by calling .backward(), propagating gradients through the graph using closures stored at each node.

5 files

Fully annotated

Start Reading

sgl-project/mini-sglang

mini-sglang is a minimal, educational re-implementation of an LLM inference serving engine inspired by SGLang. It provides a full stack from HTTP API server down to GPU kernels: a FastAPI-based HTTP/ZMQ server, a continuous-batching scheduler with paged KV-cache (naive and radix-tree prefix-sharing variants), a pluggable attention backend system (FlashAttention, FlashInfer, TensorRT-LLM), tensor-parallel distributed inference via NCCL, and support for multiple model architectures (LLaMA, Qwen2, Qwen3, Qwen3-MoE). The codebase is structured as a Python package with C++/CUDA/Triton extensions compiled via JIT and AOT loaders.

102 files

Fully annotated

Start Reading