-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Open
Description
Feature Request
Add the ability to filter web search results by including or excluding specific domains/websites in the OpenAI Responses API web_search tool.
Problem Statement
Currently, the OpenAI Responses API web_search tool configuration only supports basic parameters:
response = client.responses.create(
model="gpt-4o",
tools=[
{
"type": "web_search_preview",
"user_location": {
"type": "approximate",
"country": "US",
"city": "San Francisco",
},
}
],
input="Search query here",
)
This limits users who need to:
- Focus searches on specific trusted sources
- Exclude known unreliable or irrelevant domains
- Customize search results for specific use cases (e.g., academic research, official sources only)
Proposed Solution
Add two new optional parameters to the web_search tool configuration, similar to Perplexity AI's implementation:
include_domains
: List of domains to limit search results toexclude_domains
: List of domains to exclude from search results
Example Implementation
Current OpenAI Implementation:
response = client.responses.create(
model="gpt-4o",
tools=[
{
"type": "web_search_preview",
"user_location": {
"type": "approximate",
"country": "US",
},
}
],
input="Latest AI research papers",
)
Proposed Enhancement:
response = client.responses.create(
model="gpt-4o",
tools=[
{
"type": "web_search_preview",
"user_location": {
"type": "approximate",
"country": "US",
},
# New ___domain filtering parameters
"include_domains": ["arxiv.org", "openai.com", "nature.com"],
"exclude_domains": ["medium.com", "reddit.com"]
}
],
input="Latest AI research papers",
)
Reference: Perplexity AI Implementation
Perplexity's API already supports this functionality:
import requests
response = requests.post(
"https://api.perplexity.ai/search",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
},
json={
"query": "machine learning research",
"include_domains": ["arxiv.org", "ieee.org", "acm.org"],
"exclude_domains": ["blogspot.com", "wordpress.com"]
}
)
Use Cases
- Academic Research: Include only .edu domains and academic publishers
- Official Information: Focus on government (.gov) and organizational (.org) domains
- Technical Documentation: Include official documentation sites, exclude forums
- News Aggregation: Include trusted news sources, exclude tabloids
- Product Research: Include official vendor sites, exclude affiliate spam
Implementation Considerations
- Both parameters should be optional to maintain backward compatibility
- Support for wildcard patterns (e.g.,
*.edu
,*.gov
) - Consider adding validation for ___domain format
- Document any limitations on the number of domains that can be filtered
- Ensure the filtering happens at the search API level for efficiency
Benefits
- More precise and relevant search results
- Better control over information sources
- Reduced noise from unreliable sources
- Improved efficiency by filtering early in the search process
- Feature parity with competing APIs like Perplexity
Additional Considerations
The web_search tool could also benefit from:
- A
sites
parameter to search within specific sites only (similar to Google'ssite:
operator) - Support for excluding specific URL patterns, not just domains
- Option to prioritize certain domains in results ranking
This enhancement would significantly improve the utility of the web_search tool for developers building applications that require high-quality, ___domain-specific information retrieval.
MengAiDev
Metadata
Metadata
Assignees
Labels
No labels