extract

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 26, 2025 License: MIT Imports: 5 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Article

type Article struct {
	Title       string
	Byline      string
	ContentHTML string
	Text        string
	Excerpt     string
	TopImageURL string
}

Article represents the extracted main content of an HTML document.

ContentHTML contains a sanitized HTML fragment of the main article. Text contains plain text derived from ContentHTML. Excerpt is a short summary derived from the beginning of the Text.

func Extract

func Extract(doc *ihtml.Document, baseURL string) *Article

Extract runs the Readability-style algorithm on a parsed HTML Document.

baseURL is optional but may be used in future enhancements to resolve relative URLs or canonical links.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL