NVIDIA Llama 3.1 Nemotron Instruct 70B

September 2024 update of the Reka Flash model, enhancing speed and efficiency

Creator: NVIDIALicense: Open

Summary

Artificial Analysis Quality
Index
70
Output Speed (Tokens per
Second)
47
Model Pricing (USD per 1M Tokens)
$0.400
Latency (to receive the first
token)
14.21s
Context Window (tokens)
128k

Comparison with other models

Quality

Artificial Analysis Quality Index; Higher is better

+ Add model

Speed

Output Tokens per Second; Higher is better

+ Add model

Pricing

USD per 1M Tokens; Lower is better

+ Add model

Latency

Seconds to First Tokens Chunk Received; Lower is better

+ Add model