Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

A lightweight frontend for self-hosted Firecrawl instances

Notifications You must be signed in to change notification settings

sammcj/firecrawler

Open more actions menu

Repository files navigation

Firecrawler

A lightweight frontend for self-hosted Firecrawl API instances. This playground provides a user-friendly interface for using Firecrawl's web scraping and crawling capabilities.

Features

  • Scrape Mode: Convert a single URL to markdown, HTML, or take screenshots
  • Crawl Mode: Discover and scrape multiple pages from a starting URL
  • Extract Mode: Extract structured data from web pages using LLM
  • CORS-free: Uses a proxy server to avoid CORS issues when connecting to your Firecrawl API instance

screenshot

Getting Started

Running Locally

  1. Configure environment variables:
    cp .example.env .env
    Edit the .env file to set your desired configuration.
  2. Install dependencies and run:
    npm i
    npm start
  3. Open your browser and navigate to http://localhost:3000
  4. Enter your Firecrawl API endpoint (default: http://firecrawl:3002)
  5. Enter your API key if required
  6. Choose a mode (Scrape, Crawl, or Extract), enter a URL, and click "Run"

Using Docker

  1. Configure environment variables:

    cp .example.env .env
    

    Then edit the .env file to set your desired configuration.

  2. Build and run using Docker Compose:

    docker-compose up -d
    
  3. Open your browser and navigate to http://localhost:3000

Modes

Scrape Mode

Scrape mode allows you to convert a single URL to various formats:

  • Markdown: Clean, readable markdown format
  • HTML: Raw HTML content
  • Screenshot: Visual capture of the page
  • Links: Extract all links from the page

Advanced options include:

  • Only Main Content: Filter out navigation, footers, etc.
  • Remove Base64 Images: Exclude embedded images
  • Wait For: Time to wait for dynamic content to load
  • Timeout: Maximum time to wait for the page to load

Crawl Mode

Crawl mode allows you to discover and scrape multiple pages from a starting URL:

  • Max Depth: How many links deep to crawl
  • Page Limit: Maximum number of pages to crawl
  • Ignore Sitemap: Skip sitemap.xml discovery
  • Allow External Links: Crawl links to external domains
  • Include/Exclude Paths: Filter which paths to crawl

Extract Mode

Extract mode allows you to extract structured data from web pages using LLM:

  • Extraction Prompt: Instructions for what data to extract
  • JSON Schema: Optional schema for structured data extraction

API Compatibility

This playground is designed to work with self-hosted Firecrawl API instances. It's compatible with the Firecrawl API v1 endpoints.

Development

This is a lightweight application built with vanilla JavaScript, HTML, and CSS. Dependencies are loaded from CDNs:

  • Milligram CSS for minimal styling
  • Marked.js for markdown rendering
  • Highlight.js for syntax highlighting

No build process is required - simply edit the files and refresh the browser to see changes.

Technical Details

  • Server: Node.js with Express
  • Proxy: Custom HTTP proxy middleware
  • Configuration: Environment variables via dotenv (.env file)

Firecrawler Examples

Here are some examples of how to use the Firecrawler with different modes.

Scrape Mode Examples

Basic Markdown Conversion

  1. Enter URL: https://smcleod.net
  2. Select Format: markdown
  3. Enable "Only Main Content"
  4. Click "Run"

Screenshot Capture

  1. Enter URL: https://news.ycombinator.com
  2. Select Formats: markdown, screenshot
  3. Set Wait For: 3000 (3 seconds)
  4. Click "Run"

HTML Extraction

  1. Enter URL: https://github.com
  2. Select Formats: html, markdown
  3. Disable "Only Main Content" to get the full page
  4. Click "Run"

Crawl Mode Examples

Basic Website Crawl

  1. Switch to "Crawl" mode
  2. Enter URL: https://smcleod.net
  3. Set Max Depth: 2
  4. Set Page Limit: 10
  5. Select Format: markdown
  6. Click "Run"

Blog Crawl with Path Filtering

  1. Switch to "Crawl" mode
  2. Enter URL: https://smcleod.net/about
  3. Set Max Depth: 3
  4. Set Page Limit: 20
  5. Include Paths: blog,posts
  6. Exclude Paths: admin,login,register
  7. Click "Run"

Extract Mode Examples

Basic Content Extraction

  1. Switch to "Extract" mode
  2. Enter URL: https://smcleod.net
  3. Extraction Prompt: Extract the main heading, summary, and author from this page.
  4. Click "Run"

Structured Data Extraction

  1. Switch to "Extract" mode
  2. Enter URL: https://news.ycombinator.com
  3. Extraction Prompt: Extract the top 5 stories with their titles, points, and authors.
  4. JSON Schema:
{
  "type": "object",
  "properties": {
    "stories": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "title": { "type": "string" },
          "points": { "type": "number" },
          "author": { "type": "string" }
        }
      }
    }
  }
}
  1. Click "Run"

About

A lightweight frontend for self-hosted Firecrawl instances

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

  •  

Packages

 
 
 
Morty Proxy This is a proxified and sanitized view of the page, visit original site.