@rshirohara/pxast v0.2.2
pxast
Pixiv novel Abstract Syntax Tree
pxast is a specification for representing pixiv novel format in a syntax tree. It implements unist.
Contents
Introduction
This document defines a format for representing pixiv novel format as an abstract syntax tree. This specification is written in a Web IDL-like grammar.
Types
If you are using TypeScript, you can use the unist types by installing them with npm:
npm install @rshirohara/pxast
Nodes
Parent
interface Parent <: UnistParent {
children: [PxastContent]
}
Parent (UnistParent) represents an abstract interface in pxast containing other nodes (said to be children).
Its content is limited to only other pxast content.
Literal
interface Literal <: UnistLiteral {
value: string
}
Literal (UnistLiteral) represents an abstract interface in pxast containing a value.
Its value
field is a string
.
Root
interface Root <: Parent {
type: "root"
}
Root (Parent) represents a document.
Root can be used as the root of a tree, never as a child.
Paragraph
interface Paragraph <: Parent {
type: "paragraph"
children: [PhrasingContent]
}
Paragraph (Parent) represents a unit of discourse dealing with a particular point.
Paragraph can be used where content is expected. Its content model is phrasing content.
For example, the following text:
たとえば私はこの文章を書く。
Yields:
{
type: "paragraph",
children: [{ type: "text", value: "たとえば私はこの文章を書く。" }]
}
Heading
interface Heading <: Parent {
type: "heading"
children: [InlinePhrasingContent]
}
Heading (Parent) represents a heading of a section.
Heading can be used where flow content is expected. Its content model is inline phrasing content.
For example, the following text:
[chapter:まえがき]
Yields:
{
type: "heading";
children: [{ type: "text", value: "まえがき" }];
}
PageHeading
interface Page <: Node {
type: "pageHeading"
pageNumber: 1 <= number
}
PageHeading (Node) represents a heading of a page.
PageHeading can be used where flow content is expected. It has no content model.
A pageNumber
field must be present.
A value of 1
is said to be the minimum value.
For example, the following text:
ここは一ページ目。
[newpage]
ここが二ページ目。
Yields:
{
type: "root",
children: [
{ type: "pageHeading", pageNumber: 1 },
{
type: "paragraph",
children: [{ type: "text", value: "ここは一ページ目。" }]
},
{ type: "pageHeading", pageNumber: 2 },
{
type: "paragraph",
children: [{ type: "text", value: "ここが二ページ目。" }]
}
]
}
Text
interface Text <: Literal {
type: "text"
}
Text (Literal) represents everything that is just text.
Text can be used where phrasing content is expected.
Its content is represented by its value
field.
For example, the following text:
たとえば私はこの文章を書く。
Yields:
{ type: "text", value: "たとえば私はこの文章を書く。" }
Ruby
interface Ruby <: Literal {
type: "ruby"
ruby: string
}
Ruby (Literal) represents a small annotations that are rendered above, below, or next to text.
Ruby can be used where phrasing content is expected.
Its content is represented by its value
and ruby
fields.
For example, the following text:
[[rb:私>わたし]]
Yields:
{
type: "ruby",
value: "私",
ruby: "わたし"
}
Break
interface Break <: Node {
type: "break"
}
Break (Node) represents a line break.
Break can be used where phrasing content is expected. It has no content model.
For example, the following text:
これは一行目。
これが二行目。
Yields:
{
type: "paragraph",
children: [
{ type: "text", value: "これは一行目。" },
{ type: "break" },
{ type: "text", value: "これが二行目。" }
]
}
Link
interface Link <: Parent {
type: "link"
url: string
children: [InlinePhrasingContent]
}
Link (Parent) represents a hyperlink.
Link can be used where phrasing content is expected. Its content model is inline phrasing content.
For example, the following text:
[[jumpurl:リンク例>https://example.com]]
Yields:
{
type: "link",
url: "https://example.com",
children: [{ type: "text", value: "リンク例" }]
}
Image
interface Image <: Node {
type: "image"
illustId: string
pageNumber: 1 <= number?
}
Image (Node) represents a reference to pixiv image.
Image can be used where phrasing content is expected. It has no content model.
For example, the following text:
[pixivimage:000001-02]
Yields:
{
type: "image",
illustId: "000001",
pageNumber: 2
}
PageReference
interface PageReference <: Node {
type: "pageReference"
pageNumber: 1 <= number
}
PageReference (Node) represents a reference to PageHeading.
PageReference can be used where phrasing content is expected. It has no content model.
A pageNumber
field must be present.
A value of 1
is said to be the minimum value.
For example, the following text:
[jump:01]
Yields:
{
type: "pageReference",
pageNumber: 1
}
Content model
type PxastContent = FlowContent | PhrasingContent
Each node in pxast falls into one or more categories of Content that group nodes with similar characteristics together.
FlowContent
type FlowContent = Heading | PageHeading | Paragraph
Flow content represent the sections of document.
PhrasingContent
type PhrasingContent = Break | Image | Link | PageReference | InlinePhrasingContent
Phrasing content represent the text in a document, and its markup.
InlinePhrasingContent
type InlinePhrasingContent = Ruby | Text
Inline Phrasing content represent the text in a document, and its markup, that is intended to be stored in phrasing content.