MultimodalFlow
← Back to Blog

Jetson Thor vs Jetson Orin 64GB: Local LLM Benchmark Notes

JetsonThorOrinLLMbenchmarkQwenSGLangOllama

Edge AI hardware is no longer just for camera pipelines and lightweight detection models. With enough memory and the right runtime, devices like Jetson Thor and Jetson Orin 64GB can run local language models for robotics, inspection, private assistants, and on-site automation.

I tested two real devices on my lab network:

  • Jetson Thor at 100.98.202.31
  • Jetson Orin 64GB at 100.97.175.73

This is not a synthetic leaderboard. It is a practical engineering note: what was running, what responded, how fast it felt, and what I would use each device for.

Test Setup

DeviceRuntimeModelAPI
Jetson ThorSGLangQwen3.6-35B-A3B-FP8OpenAI-compatible /v1/chat/completions
Jetson Orin 64GBOllamaQwen2.5 0.5B Q4_K_MOllama /api/generate

The prompts covered three common local AI tasks:

  1. Chinese edge AI scenario explanation
  2. English summarization
  3. Short code explanation

Each run used a small response budget of about 160 output tokens. This is enough to compare interactive response behavior without turning the test into a long-generation benchmark.

Quick Results

DeviceModelTest CasesResult
Jetson ThorQwen3.6-35B-A3B-FP83 / 3 passed~11.2s per 160-token response
Jetson Orin 64GBQwen2.5 0.5B Q4_K_M3 / 3 passed~145 tokens/s

The Thor result should be read with context: the device was already busy, with an SGLang scheduler and another Python service consuming GPU/CPU resources. Even so, the 35B FP8 model completed every prompt successfully.

The Orin result used a much smaller model, so it is not a quality comparison. It is a useful speed baseline for lightweight local assistants and embedded control flows.

Thor: Large Local Model on Edge Hardware

Thor was running:

Qwen3.6-35B-A3B-FP8
SGLang
Context length: 8192
Served model name: qwen3.6

Measured response times:

CaseCompletion TokensTime
Chinese Q&A16010.953s
English summary16011.181s
Code explanation16011.511s

That puts the observed speed around 14 tokens/s for this loaded test condition.

Practical Meaning

Thor is the more interesting platform for heavier local LLM workloads:

  • local customer service assistant with private product documents
  • robotics scene reasoning with camera and LiDAR context
  • industrial inspection reports generated near the device
  • internal knowledge base search without sending data to the cloud

For production, I would test:

  • warm vs cold latency
  • concurrent requests
  • prompt length sensitivity
  • quantization choices
  • JSON output reliability
  • vision-language model throughput

Orin 64GB: Fast Small-Model Local Inference

Orin 64GB was running Ollama with:

qwen2.5:0.5b
GGUF Q4_K_M

Measured results:

CaseOutput TokensTimeSpeed
Chinese Q&A1603.255s144.28 tokens/s
English summary890.901s147.06 tokens/s
Code explanation500.600s145.22 tokens/s

This is exactly the kind of result that makes small local models useful. They may not replace a large cloud model, but they are fast enough for:

  • menu-driven device assistants
  • simple Chinese and English Q&A
  • local command parsing
  • short summaries
  • workflow automation
  • offline demo experiences

Which Device Should You Use?

Choose Jetson Thor if you need:

  • larger models
  • more complex reasoning
  • multimodal robotics demos
  • SGLang or OpenAI-compatible local serving
  • a stronger platform for future VLM testing

Choose Jetson Orin 64GB if you need:

  • stable edge deployment
  • lower cost than a larger platform
  • small LLM assistants
  • computer vision plus lightweight language output
  • local automation and industrial demos

Business Takeaway

The best commercial angle is not "which device has the best benchmark number." The better offer is:

We test the actual model, on the actual edge device, for the actual workflow before the customer buys or deploys hardware.

That is useful for device distributors, factories, robotics teams, and small businesses that want AI but do not know whether they need Thor, Orin, a workstation, or a cloud API.

For service packaging, these benchmark notes map cleanly to three offers:

  1. Edge AI model feasibility test
  2. Private local AI assistant demo
  3. Robot perception and reporting prototype

Next Tests

The next benchmark round should add:

  • Qwen 7B or 14B on Orin 64GB
  • the same prompt set across Thor and Orin
  • concurrent request testing
  • memory and power logging
  • vision-language prompts with camera frames
  • long-context RAG prompts with product documents

Those tests will be more useful than a one-time speed number because they map directly to real deployment decisions.

Bottom Line

In this first local test, Thor successfully served a 35B FP8 model through SGLang, while Orin 64GB delivered very fast responses with a small Qwen2.5 model through Ollama.

That gives a clear product direction:

  • Thor for high-end edge reasoning and robotics demos
  • Orin 64GB for practical local assistants, automation, and vision-plus-language prototypes

If you are choosing edge AI hardware, do not start with a spec sheet. Start with the model, the prompt, the camera or document input, and the latency target. Then benchmark the exact workflow.