Ever launched a Screaming Frog crawl only to have it crash halfway through? Or spent hours crawling a site only to realize you missed crucial data? You're not alone. The difference between a successful SEO audit and a frustrating time sink often comes down to one thing: proper configuration.
Why This Checklist Matters
Think of Screaming Frog configuration like setting up a high-performance race car. You wouldn't just jump in and floor it โ you need the right setup for the track conditions. The same goes for your crawls. Whether you're auditing a small business website or crawling an enterprise platform with millions of URLs, your configuration needs to match your specific requirements.
Who This Guide Is For
- SEO Professionals looking to optimize their technical audits
- Agency Teams managing multiple client websites
- In-house SEOs dealing with large-scale websites
- Website Administrators conducting regular site health checks
What You'll Learn
- How to configure your crawler for optimal performance
- Memory management techniques for sites of any size
- Essential filter patterns that save hours of post-processing
- Custom extraction setups that capture exactly what you need
- Testing protocols that prevent mid-crawl disasters
- Real-time monitoring strategies to ensure data quality
The Impact of Proper Configuration
- โก Faster crawl completions
- ๐ฏ More accurate data collection
- ๐พ Efficient resource usage
- ๐ซ Fewer failed crawls
- ๐ Better quality insights
Let's dive into the six critical areas of Screaming Frog configuration that can make or break your SEO audits.
1. Setting Appropriate Speed and Threads ๐
Understanding Thread Count
The number of threads determines how many parallel requests Screaming Frog makes to a website. Here's how to optimize it:
- For Small Websites (under 10,000 URLs)
- Start with 5 threads
- Max speed of 2-3 requests per second
- Monitor server response times
- Ideal for shared hosting environments
For Medium Websites (10,000-100,000 URLs)
Use 7-10 threads
Speed of 3-5 requests per second
Good for most business websites
Balance between speed and server load
For Large Websites (100,000+ URLs)
Up to 15 threads
5+ requests per second
Only for enterprise-level hosting
Monitor closely for the first 10 minutes
Speed Configuration Tips
Start conservative and increase gradually
Watch for status codes in real-time
Check server response times
Monitor crawl rate stability
Warning Signs to Watch
Increased 5XX errors
Slower response times
Timeout errors
Robots.txt blocks
2. Memory Allocation Configuration ๐พ
RAM Settings Based on Site Size
CopySmall Sites (>10k URLs):
- Minimum: 2GB RAM
- Recommended: 4GB RAM
Medium Sites (10k-100k URLs):
- Minimum: 4GB RAM
- Recommended: 8GB RAM
Large Sites (100k+ URLs):
- Minimum: 8GB RAM
- Recommended: 16GB RAM
Memory Management Best Practices
Database Storage Mode
Enable for sites over 500k URLs
Reduces RAM usage
Slower but more stable
Temporary File Location
Use SSD when possible
Set custom location for large crawls
Clean regularly
Memory Monitoring
Watch RAM usage in task manager
Set up alerts for high usage
Have cleanup procedure ready
3. Essential URL Filters ๐
Must-Have Exclude Patterns
Copy# Non-content URLs
*/thank-you/*
*/cart/*
*/checkout/*
*/my-account/*
# Parameter Exclusions
*?utm_*
*?fbclid=*
*?gclid=*
# File Types
*.pdf
*.jpg
*.png
Critical Include Patterns
Copy# Content Areas
*/product/*
*/category/*
/blog/*
/news/*
# Important Pages
/about/*
/contact/*
/services/*
Filter Strategy Tips
Start broad, then narrow
Document all exclusions
Test on sample URLs
Regular expression testing
4. Custom Extraction Setup ๐ฏ
Essential Extractions
Copy# SEO Elements
<title>
<meta name="description">
<meta name="robots">
<link rel="canonical">
# Content Elements
<h1>
<img alt="">
<a href="">
Advanced Custom Search
XPath Examples xpathCopy//div[@class='product-price'] //span[contains(@class, 'sku')] //meta[@property='og:title']/@content
CSS Selector Examples cssCopy.product-description #main-content [data-testid="price"]
Regular Expressions regexCopyprice:\s*\$(\d+\.?\d*) sku:\s*(\w+)
5. Testing Configuration ๐งช
Pre-Crawl Test Protocol
Small Section Test
Choose representative section
Crawl 100-200 URLs
Verify data accuracy
Check extraction patterns
Configuration Validation
Test all custom extractions
Verify filter patterns
Check speed impact
Validate memory usage
Test Documentation Template
markdownCopyTest Date: [DATE]
Section Tested: [URL SECTION]
URLs Crawled: [NUMBER]
Issues Found: [LIST]
Configuration Adjustments: [CHANGES MADE]
6. Monitoring and Adjustment ๐
Key Metrics to Monitor
Performance Metrics
Crawl rate
Response times
Memory usage
CPU utilization
Quality Metrics
Status codes
Extraction success rates
Filter effectiveness
Data accuracy
Adjustment Triggers
CopyResponse Time > 2s: Reduce threads
Memory Usage > 80%: Enable database mode
5XX Errors > 1%: Reduce speed
Timeout Errors: Increase wait time
Regular Monitoring Schedule
First 5 minutes: Constant monitoring
First hour: Check every 15 minutes
Ongoing: Check every 30 minutes
Large crawls: Set up alerts
Debugging Common Issues
Performance Problems
Slow Crawl Speed
Check network connection
Verify thread settings
Monitor server response
Check for rate limiting
Memory Issues
Enable database mode
Reduce concurrent threads
Clear temporary files
Increase allocated RAM
Data Quality Issues
Verify regex patterns
Check XPath accuracy
Update CSS selectors
Review filter rules
Configuration Template
yamlCopyBasic Configuration:
Threads: 7
Speed: 3 requests/second
RAM: 8GB
Database Mode: Enabled for >500k URLs
Filters:
Include: [list from above]
Exclude: [list from above]
Extractions:
SEO: [elements from above]
Custom: [specific needs]
Monitoring:
Initial: 5-minute intervals
Ongoing: 30-minute intervals
Alerts: Configured for critical metrics
Remember: Configuration is iterative - what works for one site might not work for another. Always start conservative and adjust based on actual performance.