Tired of paying for API calls and sharing your code with the cloud?

We’ve all been there: hitting rate limits on OpenAI’s API, worrying about sensitive code being uploaded to some distant server, or trying to code on a plane with spotty Wi-Fi. What if you could run powerful AI models right on your laptop, completely offline?
Enter Ollama: Local LLMs Made Ridiculously Simple
Ollama lets you run large language models locally with minimal setup. No cloud accounts, no API keys, no usage limits.
# Install it
brew install ollama
Get up and running in 30 seconds
# Start the service
ollama serve
# In another terminal, pull a model
ollama pull llama2
# Start chatting
ollama run llama2
That’s it. You’re now running a powerful LLM completely offline.
Choose the right model for your needs
Different tasks need different models. Here’s my cheat sheet:
# For coding help (6.7B parameters, fast on most laptops)
ollama pull codellama:7b
# For general assistance (balanced size/quality)
ollama pull llama2:13b
# For more powerful reasoning (needs decent GPU)
ollama pull mistral
# For tiny model that runs on anything
ollama pull tinyllama
Integrate with your workflow
The real magic happens when you integrate Ollama into your development tools:
VS Code: Install “Continue” extension and point it to your local Ollama endpoint.
Terminal: Create this simple shell function:
function ask() {
curl -s "http://localhost:11434/api/generate" \
-d "{\"model\": \"codellama\", \"prompt\": \"$1\", \"stream\": false}" |
jq -r '.response'
}
Now you can just type:
ask "Write a bash function to find the largest files in a directory"
Git hooks: Create a pre-commit hook that uses Ollama to review your code changes:
#!/bin/bash
STAGED_FILES=$(git diff --cached --name-only --diff-filter=ACM | grep -E '\.(js|jsx|ts|tsx)$')
if [[ "$STAGED_FILES" = "" ]]; then
exit 0
fi
echo "Running AI code review..."
for FILE in $STAGED_FILES; do
DIFF=$(git diff --cached "$FILE")
REVIEW=$(curl -s "http://localhost:11434/api/generate" \
-d "{\"model\": \"codellama\", \"prompt\": \"Review this code change and suggest improvements: $DIFF\", \"stream\": false}" |
jq -r '.response')
echo -e "\n\033[1;34m$FILE Review:\033[0m\n$REVIEW\n"
done
Performance tips that actually work
Running LLMs locally can be resource-intensive, but these tips make a huge difference:
Quantized models: Use 4-bit quantized versions for 2-3x speedup
ollama pull codellama:7b-q4_0Adjust context window: Smaller context = faster responses
ollama run llama2 --context 2048GPU acceleration: If you have a decent NVIDIA/AMD GPU, Ollama uses it automatically
Prune unused models: They take up significant disk space
ollama list ollama rm unused-model
The real-world use case
Last week, I was working on a client project with strict security requirements: no code could leave their network. With Ollama running locally, I could still get AI assistance for debugging a complex authentication flow without violating any policies.
The model helped me identify a race condition in the token refresh logic that would have taken hours to find manually. All without sending a single line of code to an external API.
Beyond the basics: Create your own custom models
The most powerful feature is creating custom models tailored to your needs:
FROM llama2
SYSTEM You are a specialized assistant for AWS infrastructure.
# Include AWS documentation
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
Save this as Modelfile and build it:
ollama create aws-assistant -f Modelfile
ollama run aws-assistant
Now you have an AI assistant specialized in AWS, running completely offline.
No more excuses for not using AI in your workflow
With Ollama, you get:
- Complete privacy
- No API costs
- Offline access
- Customizable models
The best part? It’s all open source. Give it a try and let me know how it transforms your development workflow!