Jetson Thor vs Jetson Orin 64GB: Local LLM Benchmark Notes

Edge AI hardware is no longer just for camera pipelines and lightweight detection models. With enough memory and the right runtime, devices like Jetson Thor and Jetson Orin 64GB can run local language models for robotics, inspection, private assistants, and on-site automation.

I tested two real devices on my lab network:

Jetson Thor at 100.98.202.31
Jetson Orin 64GB at 100.97.175.73

This is not a synthetic leaderboard. It is a practical engineering note: what was running, what responded, how fast it felt, and what I would use each device for.

Test Setup

Device	Runtime	Model	API
Jetson Thor	SGLang	Qwen3.6-35B-A3B-FP8	OpenAI-compatible `/v1/chat/completions`
Jetson Orin 64GB	Ollama	Qwen2.5 0.5B Q4_K_M	Ollama `/api/generate`

The prompts covered three common local AI tasks:

Chinese edge AI scenario explanation
English summarization
Short code explanation

Each run used a small response budget of about 160 output tokens. This is enough to compare interactive response behavior without turning the test into a long-generation benchmark.

Quick Results

Device	Model	Test Cases	Result
Jetson Thor	Qwen3.6-35B-A3B-FP8	3 / 3 passed	~11.2s per 160-token response
Jetson Orin 64GB	Qwen2.5 0.5B Q4_K_M	3 / 3 passed	~145 tokens/s

The Thor result should be read with context: the device was already busy, with an SGLang scheduler and another Python service consuming GPU/CPU resources. Even so, the 35B FP8 model completed every prompt successfully.

The Orin result used a much smaller model, so it is not a quality comparison. It is a useful speed baseline for lightweight local assistants and embedded control flows.

Thor: Large Local Model on Edge Hardware

Thor was running:

Qwen3.6-35B-A3B-FP8
SGLang
Context length: 8192
Served model name: qwen3.6

Measured response times:

Case	Completion Tokens	Time
Chinese Q&A	160	10.953s
English summary	160	11.181s
Code explanation	160	11.511s

That puts the observed speed around 14 tokens/s for this loaded test condition.

Practical Meaning

Thor is the more interesting platform for heavier local LLM workloads:

local customer service assistant with private product documents
robotics scene reasoning with camera and LiDAR context
industrial inspection reports generated near the device
internal knowledge base search without sending data to the cloud

For production, I would test:

warm vs cold latency
concurrent requests
prompt length sensitivity
quantization choices
JSON output reliability
vision-language model throughput

Orin 64GB: Fast Small-Model Local Inference

Orin 64GB was running Ollama with:

qwen2.5:0.5b
GGUF Q4_K_M

Measured results:

Case	Output Tokens	Time	Speed
Chinese Q&A	160	3.255s	144.28 tokens/s
English summary	89	0.901s	147.06 tokens/s
Code explanation	50	0.600s	145.22 tokens/s

This is exactly the kind of result that makes small local models useful. They may not replace a large cloud model, but they are fast enough for:

menu-driven device assistants
simple Chinese and English Q&A
local command parsing
short summaries
workflow automation
offline demo experiences

Which Device Should You Use?

Choose Jetson Thor if you need:

larger models
more complex reasoning
multimodal robotics demos
SGLang or OpenAI-compatible local serving
a stronger platform for future VLM testing

Choose Jetson Orin 64GB if you need:

stable edge deployment
lower cost than a larger platform
small LLM assistants
computer vision plus lightweight language output
local automation and industrial demos

Business Takeaway

The best commercial angle is not "which device has the best benchmark number." The better offer is:

We test the actual model, on the actual edge device, for the actual workflow before the customer buys or deploys hardware.

That is useful for device distributors, factories, robotics teams, and small businesses that want AI but do not know whether they need Thor, Orin, a workstation, or a cloud API.

For service packaging, these benchmark notes map cleanly to three offers:

Edge AI model feasibility test
Private local AI assistant demo
Robot perception and reporting prototype

Next Tests

The next benchmark round should add:

Qwen 7B or 14B on Orin 64GB
the same prompt set across Thor and Orin
concurrent request testing
memory and power logging
vision-language prompts with camera frames
long-context RAG prompts with product documents

Those tests will be more useful than a one-time speed number because they map directly to real deployment decisions.

Bottom Line

In this first local test, Thor successfully served a 35B FP8 model through SGLang, while Orin 64GB delivered very fast responses with a small Qwen2.5 model through Ollama.

That gives a clear product direction:

Thor for high-end edge reasoning and robotics demos
Orin 64GB for practical local assistants, automation, and vision-plus-language prototypes

If you are choosing edge AI hardware, do not start with a spec sheet. Start with the model, the prompt, the camera or document input, and the latency target. Then benchmark the exact workflow.