This is an implementation of a literate programming system in D. The goal is to be able to create books that one can read on a website, with chapters, subchapters, and sections, and additionally to be able to compile the code from the book into a working program.
Literate proogramming aims to make the source code of a program understandable. The program can be structured in any way the programmer likes, and the code should be explained.
The source code for a literate program will somewhat resemble CWEB, but differ in many key ways which simplify the source code and make it easier to read. Literate will use @ signs for commands and markdown to style the prose.
A Literate program may be just a single file, but it should also be
possible to make a book out of it, with chapters and possibly multiple
programs in a single book. If the literate command line tool is run on
a single file, it should compile that file, if it is run on a directory,
it should search for the Summary.lit
file in the directory and create a
book.
What should the directory structure of a Literate book look like?
I try to mimic the Gitbook software
here. There will be a Summary.lit
file which links to each of the
different chapters in the book. An example Summary.lit
file might look
like this:
@title Title of the book
[Chapter 1](chapter1/intro.lit)
[Subchapter 1](chapter1/example1.lit)
[Subchapter 2](chapter1/example2.lit)
[Chapter 2](section2/intro.lit)
[Subchapter 1](chapter2/example1.lit)
Sub chapters are denoted by tabs, and each chapter is linked to the correct
.lit
file using Markdown link syntax.
As a first step, I'll make a parser for single chapters only, and leave having multiple chapters and books for later.
The parser will have 2 main parts to it: the which represent the various structures in a literate program, and the parse function.
I'll quickly list the imports here.
import globals; import std.stdio; import util; import std.string: split, endsWith, startsWith, chomp, replace, strip; import std.algorithm: canFind; import std.regex: matchAll, matchFirst, regex, ctRegex, splitter; import std.conv; import std.path: extension; import std.file;
Now we have to define the classes used to represent a literate program. There are 7 such classes:
What is a literate program at the highest level? A program has multiple chapters, it has a title, and it has various commands associated with it (although some of these commands may be overwritten by chapters or even sections). It also has the file it originally came from.
A chapter is very similar to a program. It has a title, commands, sections, and also an original file. In the case of a single file program (which is what we are focusing on for the moment) the Program's file and the Chapter's file will be the same. A chapter also has a minor number and a major number;
class Chapter { public string title; public Command[] commands; public Section[] sections; public string file; public int majorNum; public int minorNum; this() { commands = []; sections = []; } string num() { if (minorNum != 0) { return to!string(majorNum) ~ "." ~ to!string(minorNum); } else { return to!string(majorNum); } } }
Used in section 4
A section has a title, commands, a number, and a series of blocks, which can either be blocks of code, or blocks of prose.
A block is more interesting. It can either be a block of code, or a block of prose, so
it has a boolean which represents what type it is. It also stores a start line. If it
is a code block, it also has a name. Finally, it stores an array of lines, and has a function
called text()
which just returns the string of the text it contains. A block also contains
a codeType
and a commentString
.
class Block { public int startLine; public string name; public bool isCodeblock; public bool isRootBlock; public Line[] lines; public string codeType; public string commentString; public Modifier[] modifiers; this() { lines = []; modifiers = []; } string text() { string text = ""; foreach (line; lines) { text ~= line.text ~ "\n"; } return text; } Block dup() { Block b = new Block(); b.startLine = startLine; b.name = name; b.isCodeblock = isCodeblock; b.codeType = codeType; b.commentString = commentString; b.modifiers = modifiers; foreach (Line l; lines) { b.lines ~= l.dup(); } return b; } }
Used in section 4
A command is quite simple. It has a name, and any arguments that are passed.
A line is the lowest level. It stores the line number, the file the line is from, and the text for the line itself.
The change class helps when parsing a change statement. It stores the file that is being changed, what the text to search for is and what the text to replace it with is. These two things are arrays because you can make multiple changes (search and replaces) to one file. In order to keep track of the current change, an index is also stored.
class Change { public string filename; public string[] searchText; public string[] replaceText; public int index; this() { searchText = []; replaceText = []; index = 0; } }
Used in section 4
That's it for the classes. These 7 classes can be used to represent an entire literate program. Now let's get to the actual parse function to turn a text file into a program.
Here we have two functions: parseProgram
and parseChapter
.
This function takes a literate book source and parses each chapter and returns the final program.
Here is an example book:
@title Title of the book
[Chapter 1](chapter1/intro.lit)
[Subchapter 1](chapter1/example1.lit)
[Subchapter 2](chapter1/example2.lit)
[Chapter 2](section2/intro.lit)
[Subchapter 1](chapter2/example1.lit)
Program parseProgram(Program p, string src) { string filename = p.file; bool hasChapters; string[] lines = src.split("\n"); int lineNum; int majorNum; int minorNum; foreach (line; lines) { lineNum++; if (line.startsWith("@title")) { p.title = strip(line[6..$]); } else if (line.startsWith("@book")) { continue; } else if (auto matches = matchFirst(line, regex(r"\[(?P<chapterName>.*)\]\((?P<filepath>.*)\)"))) { if (matches["filepath"] == "") { error(filename, lineNum, "No filepath for " ~ matches["chapterName"]); continue; } if (leadingWS(line).length > 0) { minorNum++; } else { majorNum++; minorNum = 0; } Chapter c = new Chapter(); c.file = matches["filepath"]; c.title = matches["chapterName"]; c.majorNum = majorNum; c.minorNum = minorNum; p.chapters ~= parseChapter(c, readall(File(matches["filepath"]))); hasChapters = true; } else { p.text ~= line ~ "\n"; } } return p; }
Used in section 12
The parseChapter
function is the more complex one. It parses the source of a chapter.
Before doing any parsing, we resolve the @include
statements by replacing them with
the contents of the file that was included. Then we loop through each line in the source
and parse it, provided that it is not a comment (starting with //
);
Chapter parseChapter(Chapter chapter, string src) { {Initialize some variables, 15} string[] blocks = []; string include(string file) { if (file == filename) { error(filename, 1, "Recursive include"); return ""; } if (!exists(file)){ error(filename, 1, "File " ~ file ~ " does not exist"); return ""; } return readall(File(file)); } // Handle the @include statements src = std.regex.replaceAll!(match => include(match[1]))(src, regex(`\n@include (.*)`)); string[] lines = src.split("\n"); int lineNum = 0; foreach (line; lines) { lineNum++; if (strip(line).startsWith("//") && !inCodeblock) { continue; } {Parse the line, 16} } {Close the last section, 22} return chapter; }
Used in section 12
For the initial variables, it would be nice to move the value for chapter.file
into a variable
called filename
. Additionally, I'm going to need an array of all the possible commands that
are recognized.
string filename = chapter.file; string[] commands = ["@code_type", "@comment_type", "@compiler", "@error_format", "@add_css", "@overwrite_css", "@colorscheme", "@include"];
Added to in section 15
Used in section 14
I also need to keep track of the current section that is being parsed, and the current block that is being parsed, because the parser is going through the file one line at a time. I'll also define the current change being parsed.
Section curSection; int sectionNum = 0; Block curBlock; Change curChange;
Added to in section 15
Used in section 14
Finally, I need 3 flags to keep track of if it is currently parsing a codeblock, a search block, or a replace block.
When parsing a line, we are either inside a code block, or inside a prose block, or we are transitioning from one to the other. So we'll have an if statement to separate the two.
if (!inCodeblock) { // This might be a change block {Parse change block, 23} {Parse a command, 16} {Parse a title command, 16} {Parse a section definition, 17} {Parse the beginning of a code block, 18} else if (curBlock !is null) { if (line.split().length > 1) { if (commands.canFind(line.split()[0])) { continue; } } {Add the line to the list of lines, 21} } } else if (startsWith(line, "---")) { {Begin a new prose block, 20} } else if (curBlock !is null) { {Add the line to the list of lines, 21} }
Used in section 14
Parsing a command and the title command are both fairly simple, so let's look at them first.
To parse a command we first make sure that there is the command name, and any arguments. Then we check if the command is part of the list of commands we have. If it is, we create a new command object, fill in the name and arguments, and add it to the chapter object.
We also do something special if it is a @include
command. For these ones, we take the file
read it, and parse it as a chapter (using the parseChapter
function). Then we add the
included chapter's sections to the current chapter's sections. In this case, we don't add
the @include
command to the list of chapter commands.
if (line.split().length > 1) { if (commands.canFind(line.split()[0])) { Command cmd = new Command(); cmd.name = line.split()[0]; auto index = cmd.name.length; cmd.args = strip(line[index..$]); cmd.lineNum = lineNum; cmd.filename = filename; if (cmd.args == "none") { cmd.args = ""; } if (curSection is null) { chapter.commands ~= cmd; } else { curSection.commands ~= cmd; } } }
Parsing an @title
command is even simpler.
if (startsWith(line, "@title")) { chapter.title = strip(line[6..$]); }
When a new section is created (using @s
), we should add the current section to the list
of sections for the chapter, and then we should create a new section, which becomes the
current section.
else if (startsWith(line, "@s")) { if (curBlock !is null && !curBlock.isCodeblock) { if (strip(curBlock.text()) != "") { curSection.blocks ~= curBlock; } } else if (curBlock !is null && curBlock.isCodeblock) { error(chapter.file, curBlock.startLine, "Unclosed block {" ~ curBlock.name ~ "}"); } // Make sure the section exists if (curSection !is null) { chapter.sections ~= curSection; } curSection = new Section(); curSection.title = strip(line[2..$]); curSection.commands = chapter.commands ~ curSection.commands; curSection.num = ++sectionNum; curBlock = new Block(); curBlock.isCodeblock = false; }
Used in section 16
Codeblocks always begin with --- title
, so we can use the regex ^---.+
to represent this.
Once a new codeblock starts, the old one must be appended to the current section's list of
blocks, and the current codeblock must be reset.
else if (matchAll(line, regex("^---.+"))) {
if (curSection is null) {
error(chapter.file, lineNum, "You must define a section with @s before writing a code block");
continue;
}
if (curBlock !is null) {
curSection.blocks ~= curBlock;
}
curBlock = new Block();
curBlock.startLine = lineNum;
curBlock.isCodeblock = true;
curBlock.name = strip(line[3..$]);
{Parse Modifiers, 19}
if (blocks.canFind(curBlock.name)) {
if (!curBlock.modifiers.canFind(Modifier.redef) && !curBlock.modifiers.canFind(Modifier.additive)) {
error(filename, lineNum, "Redefinition of {" ~ curBlock.name ~ "}, use ':=' to redefine");
}
} else {
blocks ~= curBlock.name;
}
foreach (cmd; curSection.commands) {
if (cmd.name == "@code_type") {
curBlock.codeType = cmd.args;
} else if (cmd.name == "@comment_type") {
curBlock.commentString = cmd.args;
}
}
inCodeblock = true;
}
Used in section 16
Modifier format for a code block: --- Block Name --- noWeave +=
.
The checkForModifiers
ugliness is due to lack of (?|...)
and friends.
First half matches for expressions with modifiers:
(?P<namea>\S.*)
: Keep taking from the first non-whitespace character ...
[ \t]-{3}[ \t]
: Until it matches ---
(?P<modifiers>.+)
: Matches everything after the separator.
Second half matches for no modifiers: Ether Block name
and with a floating separator Block Name ---
.
|(?P<nameb>\S.*?)
: Same thing as #1 but stores it in nameb
[ \t]*?
: Checks for any amount of whitespace (Including none.)
(-{1,}$
: Checks for any floating -
and verifies that nothing else is there untill end of line.
|$))
: Or just checks that there is nothing but the end of the line after the whitespace.
Returns ether namea
and modifiers
or just nameb
.
auto checkForModifiers = ctRegex!(`(?P<namea>\S.*)[ \t]-{3}[ \t](?P<modifiers>.+)|(?P<nameb>\S.*?)[ \t]*?(-{1,}$|$)`); auto splitOnSpace = ctRegex!(r"(\s+)"); auto modMatch = matchFirst(curBlock.name, checkForModifiers); // matchFirst returns unmatched groups as empty strings if (modMatch["namea"] != "") { curBlock.name = modMatch["namea"]; } else if (modMatch["nameb"] != ""){ curBlock.name = modMatch["nameb"]; // Check for old syntax. if (curBlock.name.endsWith("+=")) { curBlock.modifiers ~= Modifier.additive; curBlock.name = strip(curBlock.name[0..$-2]); } else if (curBlock.name.endsWith(":=")) { curBlock.modifiers ~= Modifier.redef; curBlock.name = strip(curBlock.name[0..$-2]); } } else { error(filename, lineNum, "Something went wrong with: " ~ curBlock.name); } if (modMatch["modifiers"]) { foreach (m; splitter(modMatch["modifiers"], splitOnSpace)) { switch(m) { case "+=": curBlock.modifiers ~= Modifier.additive; break; case ":=": curBlock.modifiers ~= Modifier.redef; break; case "noWeave": curBlock.modifiers ~= Modifier.noWeave; break; case "noTangle": curBlock.modifiers ~= Modifier.noTangle; break; default: error(filename, lineNum, "Invalid modifier: " ~ m); break; } } }
Used in section 18
Codeblocks end with just a ---
. When a codeblock ends, we do the same as when it begins,
except the new block we create is a block of prose as opposed to code.
Finally, if the current line is nothing interesting, we just add it to the current block's list of lines.
curBlock.lines ~= new Line(line, filename, lineNum);
Now we're done parsing the line.
When the end of the file is reached, the last section has not been closed and added to the
chapter yet, so we should do that. Additionally, if the last block is a prose block, it should
be closed and added to the section first. If the last block is a code block, it should have been
closed with ---
. If it wasn't we'll throw an error.
Parsing a change block is somewhat complex. Change blocks look like this:
@change file.lit
Some comments here...
@replace
replace this text
@with
with this text
@end
More comments ...
@replace
...
@with
...
@end
...
@change_end
You can make multiple changes on one file. We've got two nice flags for keeping track of which kind of block we are in: replaceText or searchText.
// Start a change block if (startsWith(line, "@change") && !startsWith(line, "@change_end")) { curChange = new Change(); curChange.filename = strip(line[7..$]); continue; } else if (startsWith(line, "@replace")) { // Begin the search block curChange.searchText ~= ""; curChange.replaceText ~= ""; inReplaceBlock = false; inSearchBlock = true; continue; } else if (startsWith(line, "@with")) { // Begin the replace block and end the search block inReplaceBlock = true; inSearchBlock = false; continue; } else if (startsWith(line, "@end")) { // End the replace block inReplaceBlock = false; inSearchBlock = false; // Increment the number of changes curChange.index++; continue; } else if (startsWith(line, "@change_end")) { // Apply all the changes string text = readall(File(curChange.filename)); foreach (i; 0 .. curChange.index) { text = text.replace(curChange.searchText[i], curChange.replaceText[i]); } Chapter c = new Chapter(); c.file = curChange.filename; // We can ignore these, but they need to be initialized c.title = ""; c.majorNum = -1; c.minorNum = -1; Chapter includedChapter = parseChapter(c, text); // Overwrite the current file's title and add to the commands and sections chapter.sections ~= includedChapter.sections; chapter.commands ~= includedChapter.commands; chapter.title = includedChapter.title; continue; } // Just add the line to the search or replace text depending else if (inSearchBlock) { curChange.searchText[curChange.index] ~= line ~ "\n"; continue; } else if (inReplaceBlock) { curChange.replaceText[curChange.index] ~= line ~ "\n"; continue; }
Used in section 16