Skip to content

Add ___domain filtering options to web_search tool in Responses API #2572

@jay-dhamale

Description

@jay-dhamale

Feature Request

Add the ability to filter web search results by including or excluding specific domains/websites in the OpenAI Responses API web_search tool.

Problem Statement

Currently, the OpenAI Responses API web_search tool configuration only supports basic parameters:

response = client.responses.create(
    model="gpt-4o",
    tools=[
        {
            "type": "web_search_preview",
            "user_location": {
                "type": "approximate",
                "country": "US",
                "city": "San Francisco",
            },
        }
    ],
    input="Search query here",
)

This limits users who need to:

  • Focus searches on specific trusted sources
  • Exclude known unreliable or irrelevant domains
  • Customize search results for specific use cases (e.g., academic research, official sources only)

Proposed Solution

Add two new optional parameters to the web_search tool configuration, similar to Perplexity AI's implementation:

  1. include_domains: List of domains to limit search results to
  2. exclude_domains: List of domains to exclude from search results

Example Implementation

Current OpenAI Implementation:

response = client.responses.create(
    model="gpt-4o",
    tools=[
        {
            "type": "web_search_preview",
            "user_location": {
                "type": "approximate",
                "country": "US",
            },
        }
    ],
    input="Latest AI research papers",
)

Proposed Enhancement:

response = client.responses.create(
    model="gpt-4o",
    tools=[
        {
            "type": "web_search_preview",
            "user_location": {
                "type": "approximate",
                "country": "US",
            },
            # New ___domain filtering parameters
            "include_domains": ["arxiv.org", "openai.com", "nature.com"],
            "exclude_domains": ["medium.com", "reddit.com"]
        }
    ],
    input="Latest AI research papers",
)

Reference: Perplexity AI Implementation

Perplexity's API already supports this functionality:

import requests

response = requests.post(
    "https://api.perplexity.ai/search",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "query": "machine learning research",
        "include_domains": ["arxiv.org", "ieee.org", "acm.org"],
        "exclude_domains": ["blogspot.com", "wordpress.com"]
    }
)

Use Cases

  1. Academic Research: Include only .edu domains and academic publishers
  2. Official Information: Focus on government (.gov) and organizational (.org) domains
  3. Technical Documentation: Include official documentation sites, exclude forums
  4. News Aggregation: Include trusted news sources, exclude tabloids
  5. Product Research: Include official vendor sites, exclude affiliate spam

Implementation Considerations

  • Both parameters should be optional to maintain backward compatibility
  • Support for wildcard patterns (e.g., *.edu, *.gov)
  • Consider adding validation for ___domain format
  • Document any limitations on the number of domains that can be filtered
  • Ensure the filtering happens at the search API level for efficiency

Benefits

  • More precise and relevant search results
  • Better control over information sources
  • Reduced noise from unreliable sources
  • Improved efficiency by filtering early in the search process
  • Feature parity with competing APIs like Perplexity

Additional Considerations

The web_search tool could also benefit from:

  • A sites parameter to search within specific sites only (similar to Google's site: operator)
  • Support for excluding specific URL patterns, not just domains
  • Option to prioritize certain domains in results ranking

This enhancement would significantly improve the utility of the web_search tool for developers building applications that require high-quality, ___domain-specific information retrieval.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions