The Pointer Crawler enables you to automatically gather and analyze content from your product, creating a comprehensive knowledge base for AI-powered features.

Prerequisites

Installation

Install the Pointer CLI globally using npm:

npm install -g pointer-cli

Verify the installation:

pointer --version

Authentication

Create an API key

1

Navigate to API Keys

Go to your Keys settings in the Pointer dashboard.

2

Generate new key

Click Create new key and provide:

  • Name: Descriptive identifier (e.g., “CLI Production”)
  • Description: Optional context about key usage
  • Expiration: Optional expiry date (defaults to never expire)
3

Copy your secret key

Save the generated key immediately - it won’t be shown again. Keys follow the format:

pt_sec_*****************************************

Configure authentication

Set your secret key using one of these methods:

export POINTER_SECRET_KEY="pt_sec_your_key_here"

Environment variables are recommended for security. Command-line options may expose keys in shell history.

Core workflow

Step 1: Initialize your website

Start by adding your website to the crawler configuration:

pointer init

The interactive prompt will guide you through:

  1. Entering a friendly name for identification
  2. Providing your website URL
  3. Confirming the configuration

Step 2: Scrape your content

Begin the automated content collection:

pointer scrape

Choose from interactive options:

  • Scraping mode: Headless (fast) or Browser (with authentication)
  • Crawl depth: Fast (surface content) or Deep (interactive elements)
  • PII protection: Configure sensitivity and redaction settings

The CLI saves your progress automatically. If interrupted, it will offer to resume from where it left off.

Step 3: Upload for analysis

Send your scraped content to Pointer for processing:

pointer upload

The CLI will:

  1. Display a summary of collected data
  2. Confirm the upload scope
  3. Transfer content to your knowledge base

Command reference

Primary commands

CommandDescriptionAuthentication
pointer initAdd a website to crawlRequired
pointer scrapeCollect content from configured websitesRequired
pointer uploadTransfer scraped data to PointerRequired
pointer statusCheck crawl processing statusRequired
pointer listView local scraped dataNot required
pointer cleanupRemove all local dataNot required
pointer purgeDelete server-side crawl dataRequired

Global options

Available for all commands:

OptionDescription
-s, --secret-key <key>API secret key (overrides environment variable)
-v, --versionDisplay CLI version
--helpShow command help

Scraping options

Configure pointer scrape behavior:

OptionDescriptionDefault
--max-pages <number>Maximum pages to crawl200
--concurrency <number>Parallel page processing1
--fastUse fast crawl modeInteractive prompt
--no-pii-protectionDisable PII detectionPII protection enabled
--pii-sensitivity <level>Set detection level (low/medium/high)Interactive prompt
--exclude-routes <patterns>Comma-separated routes to excludeNone
--include-routes <patterns>Comma-separated routes to include (whitelist mode)None
--bearer-token <token>Bearer token for API authenticationNone
--headers <json>Custom headers as JSON stringNone
--cookies-file <path>Path to cookies JSON fileNone
--log-level <level>Logging verbosityinfo

Excluding routes

The --exclude-routes flag allows you to specify routes that should be excluded from scraping. This is useful for avoiding admin panels, API endpoints, or specific file types.

# Exclude a single route
pointer scrape --exclude-routes "/admin"

# Exclude multiple routes (comma-separated)
pointer scrape --exclude-routes "/admin,/api,/private"

# Use glob patterns to exclude multiple matching routes
pointer scrape --exclude-routes "/admin/*,/api/*,*.pdf"

Pattern types:

  • Exact match: /admin - excludes only the exact path
  • Wildcard patterns:
    • /admin/* - excludes all paths starting with /admin/
    • *.pdf - excludes all PDF files
    • /api/*/docs - excludes paths like /api/v1/docs, /api/v2/docs

The exclusion check is performed on the URL path only (not the full URL). Patterns are case-sensitive, and the start URL cannot be excluded.

Including routes (whitelist mode)

The --include-routes flag allows you to specify which routes should be included in scraping. When used, ONLY matching routes will be scraped.

# Only scrape product pages
pointer scrape --include-routes "/products/*"

# Only scrape specific sections
pointer scrape --include-routes "/blog/*,/docs/*,/tutorials/*"

# Combine with exclude for fine-grained control
pointer scrape --include-routes "/api/*" --exclude-routes "/api/internal/*,*.pdf"

Include vs Exclude Logic:

  • If --include-routes is specified, a URL must match at least one include pattern to be scraped
  • If both --include-routes and --exclude-routes are specified:
    1. URL must match an include pattern
    2. URL must NOT match any exclude pattern

Authentication options

Bearer token authentication

Use for APIs that require bearer token authentication:

pointer scrape --bearer-token "sk-proj-abc123xyz789"

This adds the header: Authorization: Bearer sk-proj-abc123xyz789

Custom headers

Add any custom headers required by the target website:

# Single header
pointer scrape --headers '{"X-API-Key": "my-api-key"}'

# Multiple headers
pointer scrape --headers '{"X-API-Key": "key123", "X-Client-ID": "client456"}'

# Headers with authentication
pointer scrape --headers '{"Authorization": "Basic dXNlcjpwYXNz", "Accept": "application/json"}'

Cookies file

Load cookies from a JSON file for session-based authentication:

pointer scrape --cookies-file ./cookies.json

The cookies file should be in Playwright’s cookie format:

[
  {
    "name": "session_id",
    "value": "abc123xyz",
    "domain": ".example.com",
    "path": "/",
    "expires": -1,
    "httpOnly": true,
    "secure": true,
    "sameSite": "Lax"
  },
  {
    "name": "auth_token", 
    "value": "token456",
    "domain": ".example.com",
    "path": "/"
  }
]

Combined examples

Scraping a protected API documentation

pointer scrape \
  --include-routes "/api/v2/docs/*" \
  --exclude-routes "*.pdf,*.zip" \
  --bearer-token "your-api-token" \
  --headers '{"Accept": "text/html"}' \
  --max-pages 100

Scraping an e-commerce site with login

  1. First, save your cookies after manual login:
# Use browser mode to login manually
pointer scrape --mode browser --save-session
  1. Then use the saved cookies for subsequent scrapes:
pointer scrape \
  --include-routes "/products/*,/categories/*" \
  --exclude-routes "/products/*/reviews,/checkout/*" \
  --cookies-file ./scraped-data/.auth/yoursite.json \
  --concurrency 5

Scraping with multiple authentication methods

pointer scrape \
  --bearer-token "api-token-123" \
  --headers '{"X-Client-Version": "2.0", "Accept-Language": "en-US"}' \
  --include-routes "/api/*" \
  --log-level debug

Best practices

Automation examples

While the CLI is designed for interactive use, automation is supported for CI/CD pipelines:

# Automated crawling with predetermined settings
pointer scrape --max-pages 100 --concurrency 5 --fast --no-pii-protection

# Direct status check for specific crawl
pointer status --crawl-id abc123 --pages

# Skip confirmations for scripted cleanup
pointer purge --crawl-id abc123 --force

Use automation options carefully. Interactive mode provides safety confirmations and validation that prevent common errors.

Troubleshooting

Authentication errors

If you encounter authentication issues:

  1. Verify your API key is valid in the dashboard
  2. Check environment variable is set correctly: echo $POINTER_SECRET_KEY
  3. Ensure the key hasn’t expired
  4. Confirm you have necessary permissions

Crawling interruptions

The crawler automatically saves progress. If interrupted:

pointer scrape
# Will prompt: "Resume from where it left off?"

Upload limitations

  • Maximum 500 pages per upload (API limit)
  • Large crawls are automatically truncated
  • Use --max-pages to control crawl size upfront

Next steps

After successfully crawling and uploading your content:

  1. View your enriched knowledge base in the Knowledge section
  2. Configure AI features to leverage the collected data
  3. Monitor analytics to understand content usage
  4. Set up regular crawls to keep knowledge current