Agneya Tharun

Cybersecurity · Machine Learning · Robotics

Featured

d4rt clone

A single feed-forward model that turns ordinary video into 4D geometry: per-pixel depth, long-range 3D point tracking, and camera pose, all recovered from one unified query interface.

PyTorchViT3D VisionCUDA

drag to orbit · scroll to zoom

01 / Input video

A short video clip is resized and fed to the model.

64× less compute

single L40S vs 64-TPU cluster

Reproduced the paper's benchmarks on one NVIDIA L40S GPU. Matched the paper's ViT-B Sintel depth loss and stayed competitive with the 1B-parameter VGGT model on Sintel PC L1 geometric-mean loss (1.635 vs 1.582) at 7× fewer parameters.

−53% trajectory error

custom pose-decoder head

Engineered a Pi3/LoGeR-style 5-layer pose head with 9D-rotation SVD orthogonalization to fix the pose-estimation bottleneck, cutting ATE by 53% and beating the published ViT-B baseline on camera pose by 26%.

2.4× train · 2,400 fps

throughput & memory

Vectorized query generation (256× speedup), pre-resized fp16 dataset caches (35× less disk), AMP mixed-precision, and CUDA-allocator tuning enabled full ViT-B training in 46 GB on one consumer GPU. 50 clips/sec at fp16, 3.5 GB peak for inferences.

Incremental 3D Reconstruction

A live phone feed becomes a continuously growing 3D point cloud in the browser, with text-promptable 3D object detection. Built on the π³ / Pi3X feed-forward geometry model at TreeHacks 2026.

Pi3XSAM3three.jsFastAPI

drag to orbit · scroll to zoom

01 / Capture

A phone streams camera frames to the server.

Real time, in browser

phone → server → viewer

A three-process system: a phone sender streams frames over secure WebSockets to a FastAPI server with a bounded, drop-on-overflow queue, and a three.js viewer renders the cloud as it grows.

Sim(3) Umeyama stitching

chunked + prior-conditioned

Frames run through Pi3X in 16-frame chunks with 6-frame overlap; the previous chunk's poses, depth, and rays are injected as a prior, and each new chunk is aligned into the global frame by a masked Umeyama SVD.

Text-promptable 3D boxes

SAM3 → 3D OBBs

SAM3 segments prompted objects on keyframes; the 2D masks are lifted into 3D through Pi3X's dense point maps and fitted to oriented bounding boxes, deduplicated across frames and streamed to the viewer.

llama.cpp · Vulkan Q2_0 Kernel

A merged contribution to llama.cpp's Vulkan backend implementing the Q2_0 (2-bit ternary) format, so 1.58-bit BitNet-style models run fully on any Vulkan GPU instead of falling back to CPU.

C++GLSLVulkanggml

01 / The problem

Ternary models pack weights into Q2_0; Vulkan had no kernel for it.

2 bits / weight

ternary Q2_0 format

A block of 128 weights stores one fp16 scale plus 32 bytes of 2-bit codes. Each code maps to a ternary value, so 1.58-bit models pack into roughly 2.1 bits per weight on the GPU.

262,677 checks pass

0 failed

Five GLSL shaders cover every Vulkan path, wired in at ~32 registration points. A 959-line CPU simulator re-implements each shader bit-for-bit and validates it against the ggml reference.

31.5 tok / s

Ternary-Bonsai-8B, full GPU

End to end on an Intel Arc GPU, an 8B ternary model offloads fully to Vulkan and generates coherent text at 31.5 tokens/sec generation, 52 tokens/sec prompt. Merged as PR #32, +1,128 lines.

Selected Work

01repo ↗

GPU Compute Core over UART

A fabricated 2×2 matrix-multiplier chip, taped out through TinyTapeout. It reads two matrices over UART, computes the product in a state machine, and returns the result on serial and parallel pins.

VerilogUARTSky130

1st · Jump Trading

02private

CMU AI Poker Bot

A bot that approximates Nash equilibrium for a Texas Hold'em variant. It originally just used precomputed hand-rank tables and monte carlo methods, but then I realized I could train a model to detect the top 10 best bots and deploy specific countermeasures against them.

PythonMCCFRRL

3rd · VentureHacks

03demo ↗

Vantage

A geospatial disaster-response tool that pulls satellite imagery, detects each roof with a vision-language model, scores damage 0 to 100, and plans contractor routes with PDF reports.

FastAPIMoondreamReact

04repo ↗

AVScan2Vec2

A dual-transformer autoencoder that embeds a file's antivirus-scan results into a vector, enabling malware classification, clustering, and nearest-neighbor lookup over a Qdrant index.

PyTorchTransformersQdrant

1st · NexHacks

05private

Cognit

Uses a Tobii eye-tracker to verify a user actually read a document, then mints a verifiable on-chain certificate of completion from a generated comprehension quiz.

FastAPITobiiSolidity

06demo ↗

Sparrows

A platform for schools to manage schedules, clubs, and events, with a student side offering academic resources and an in-browser code compiler that runs on the edge via WebAssembly.

Next.jsWASMCloudflare

07demo ↗

Perspectron

Turns any screen into a touchscreen. A phone camera tracks hand movements and sends deltas over WebRTC, and I precomputed homographies (using the ArUco markers) to warp the cursor position and compensate for perspective distortion.

MediaPipeWebRTCElectron

08demo ↗

Thryving

A mental-health webapp for teens: keep a journal, build widget flowcharts to map your journey, and get AI sentiment analysis on entries, with auth and subscriptions.

Next.jsGeminiStripe

1st · HackCMU

09demo ↗

DuckHunt Remastered

Turns your phone into a light gun to play Duck Hunt on a shared screen with friends, streaming phone motion with near-zero latency over WebRTC.

WebRTCSocket.ioNode

10demo ↗

TWDNE

Every subdirectory you visit is generated on the fly by an LLM: visit /anything and get a plausible page for it, rendered live.

Next.jsGroqSupabase