Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate sqlite output from RakuDoc source input #359

Open
dontlaugh opened this issue Mar 25, 2024 · 9 comments
Open

Generate sqlite output from RakuDoc source input #359

dontlaugh opened this issue Mar 25, 2024 · 9 comments

Comments

@dontlaugh
Copy link
Collaborator

dontlaugh commented Mar 25, 2024

This issue is spawned from #75

The objective here is to create a schema and populate it with the data attributes parsed from https://github.com/Raku/doc

The resulting sqlite database could be used to power relational queries, such as all classes that implement a role, or listing all methods on a class and whether they come from roles, etc.

A sqlite database can also be a potential intermediary format for building the website itself. E.g. the middle part of the Collection framework's pipeline could be simplified or extended if we can consistently parse our pod6 files into a relational database.

Somewhat related issues:

@finanalyst
Copy link
Collaborator

The first step - I suggest - would be to create an routines.sql file from the data gathered to generate the routines page.

@patrickbkr
Copy link
Member

Thinking big, this could also be a good starting point for a rakudoc commandline doc browser.

@finanalyst
Copy link
Collaborator

@patrickbkr Lets see how the first step goes, but if routines works well, then we can put all the data from the Search function into an sql next. And then locally maybe have a Cro app running so that sql return data in the form of http links to individual pages are served locally.

@dontlaugh
Copy link
Collaborator Author

This full text search module is available: https://sqlite.org/fts5.html I've never used it, myself.

@finanalyst finanalyst changed the title Generate sqlite output from pod6 docs input Generate sqlite output from RakuDoc source input Mar 30, 2024
finanalyst added a commit that referenced this issue Mar 30, 2024
- add sqlite-db plugin
- plugin accesses the dataset created for the Routines page in the website
- the data is formatted into the rows of an sqlite table
- the completed sqlite database is output into a filename specified in the `configs/03-plugin-options.raku` config file, so changing the value of the `db-filename` field of the `sqlite-db` sub-hash changes the output file name.
- the sqlite file is moved by Collection to a directory defined by 'database-dir' relative to the directory in which Collection runs. As set up here, the directory sqlite_dir will be at the same level as the existing renedered_html
finanalyst added a commit that referenced this issue Apr 3, 2024
* First step to issue #359, no change to the HTML output
- add sqlite-db plugin
- plugin accesses the dataset created for the Routines page in the website
- the data is formatted into the rows of an sqlite table
- the completed sqlite database is output into a filename specified in the `configs/03-plugin-options.raku` config file, so changing the value of the `db-filename` field of the `sqlite-db` sub-hash changes the output file name.
- the sqlite file is moved by Collection to a directory defined by 'database-dir' relative to the directory in which Collection runs. As set up here, the directory sqlite_dir will be at the same level as the existing renedered_html

* Separate out schema from data
- data is now in filename specified in plugin config
@dontlaugh
Copy link
Collaborator Author

If - during the build - we can identify hyperlinks (whether relative or external), it would be useful to toss them into a url table. This can be used to drive an automated link-check test. Refs Raku/doc#4476


That's just one use case. There are many. I'd like to start re-architecting the build to pivot around a normalized sqlite database. Here is a diagram.

sqlite

If we pull this off, the advantage will be that we can decouple each of the downstream processes that depend on the parsed data (A, B, C, D in the diagram). SQLite supports multiple processes reading simultaneously, so we can run them in parallel to speed things up significantly. I also think it will be easier for contributors (but I could be wrong, who knows).

The challenge is that the "DB Creation script", the build process entry point, becomes more complex. Effectively this component is a compiler from rakudoc to SQL statements.

rakudoc -> intermediate representations (IR) -> SQL statements

In principle, if we capture and normalize all the rakudoc source material, we can drive any downstream use case: static websites, man pages, tests, offline search.

@finanalyst
Copy link
Collaborator

finanalyst commented May 29, 2024

@dontlaugh This is a very significant change. Honestly, I haven't thought of the build process in this way, and it will take me a while to think through how to do this. I think your rakudoc -> internal rep -> SQL normalisation is rather more difficult underneath than it seems.
Currently, I am implementing a renderer for RakuDoc v2. It should be noted that RakuDoc v1 was never properly implemented.
The new renderer works directly with the AST of the source files, and not with a compiled variable $=pod. The practical result is that the AST is produced about 6x faster than the compiled $=pod. In addition, we may be able to eliminate the 'caching' step, which can take about five minutes.
However, the AST representation of each source file could possibly be the sort of internal representation you are looking for.

@dontlaugh
Copy link
Collaborator Author

the AST representation of each source file could possibly be the sort of internal representation you are looking for

It sounds like it. I imagine that the complex program on the left hand side would be best implemented by a library that works with a proper AST.

The data we need in normalized form will look different than an in-memory AST, but as long as each AST node can be serialized as text or bytes, we can store both in different database tables.

The new renderer works directly with the AST of the source files

I am pleased to hear that. What library is parsing the rakudoc into an AST? Can you link to it? Even if it is in nascent stages.

This is a very significant change.

I agree. This is in the high-effort, but (potentially) high-reward category.

@finanalyst
Copy link
Collaborator

@dontlaugh the new Rakudo compiler creates AST for all programs and the AST can be manipulated. Although the Rakudo AST compiler has not yet completely landed - this will be the raku.e milestone - there is sufficient for RakuDoc, and that has been backported into raku.d.

As an example, if you take a recent version of raku, eg 2024.04, and run the following in a terminal assuming the current directory is a local clone of Raku/docs/doc/Language/, you will get the AST of the file

raku -e 'say  "101-basics.rakudoc".IO.slurp.AST'

The new bit is the .AST method which returns the AST of the input string.

@dontlaugh
Copy link
Collaborator Author

Saw this in Rakudo Weekly: The Graph package may prove useful here https://raku.land/zef:antononcube/Graph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants