💪 I created the first USB in the world that has LLM running locally on it.
Cherry on top of the cake, it requires no dependency, you can connect it to any computer, create a new file and the content will be automatically generated from the USB side. Essentially the first ever native LLM USB.
This was done on an 8-year-old pi zero, which has 512MB of Ram and an arm1176jzf-s CPU.
To be able to run LLM, let alone with llama.cpp on this was quite something. Arm1176jzf-s was first released in 2002, it implements armv6l isa. It took 12 hours just to compile the whole source of llamacpp and more than a week for me to make it run on an unsupported isa.
The performance is quite terrible and offer no practical use, but it is a fun look into the future, where LLM can run potentially anywhere.
🎖️ Benchmark of models
- Tiny-15M: 223ms/token
- Lamini-T5-Flan-77M: 2.5s/token
- SmolLM2-136M: 2.2s/token
🧑💻 Repository
I made a repo that details the steps used and the modification I made to llama.cpp: https://github.com/pham-tuan-binh/llama.zero
📽️ Chapters
00:00 - Intro
00:20 - Hardware & Casing
01:48 - Case Assembly
02:17 - Using Llama.cpp
02:51 - Fixing Llama.cpp
05:14 - LLM Demo & Benchmark
07:30 - Building a real USB
09:30 - USB Demo
11:57 - Endnote
👋 Feel free to contact me at [email protected] or comment below if you have any question.