Generate sqlite output from RakuDoc source input #359

dontlaugh · 2024-03-25T19:14:23Z

This issue is spawned from #75

The objective here is to create a schema and populate it with the data attributes parsed from https://github.com/Raku/doc

The resulting sqlite database could be used to power relational queries, such as all classes that implement a role, or listing all methods on a class and whether they come from roles, etc.

A sqlite database can also be a potential intermediary format for building the website itself. E.g. the middle part of the Collection framework's pipeline could be simplified or extended if we can consistently parse our pod6 files into a relational database.

Somewhat related issues:

finanalyst · 2024-03-25T20:51:03Z

The first step - I suggest - would be to create an routines.sql file from the data gathered to generate the routines page.

patrickbkr · 2024-03-26T08:47:07Z

Thinking big, this could also be a good starting point for a rakudoc commandline doc browser.

finanalyst · 2024-03-26T09:59:21Z

@patrickbkr Lets see how the first step goes, but if routines works well, then we can put all the data from the Search function into an sql next. And then locally maybe have a Cro app running so that sql return data in the form of http links to individual pages are served locally.

dontlaugh · 2024-03-26T17:56:03Z

This full text search module is available: https://sqlite.org/fts5.html I've never used it, myself.

- add sqlite-db plugin - plugin accesses the dataset created for the Routines page in the website - the data is formatted into the rows of an sqlite table - the completed sqlite database is output into a filename specified in the `configs/03-plugin-options.raku` config file, so changing the value of the `db-filename` field of the `sqlite-db` sub-hash changes the output file name. - the sqlite file is moved by Collection to a directory defined by 'database-dir' relative to the directory in which Collection runs. As set up here, the directory sqlite_dir will be at the same level as the existing renedered_html

* First step to issue #359, no change to the HTML output - add sqlite-db plugin - plugin accesses the dataset created for the Routines page in the website - the data is formatted into the rows of an sqlite table - the completed sqlite database is output into a filename specified in the `configs/03-plugin-options.raku` config file, so changing the value of the `db-filename` field of the `sqlite-db` sub-hash changes the output file name. - the sqlite file is moved by Collection to a directory defined by 'database-dir' relative to the directory in which Collection runs. As set up here, the directory sqlite_dir will be at the same level as the existing renedered_html * Separate out schema from data - data is now in filename specified in plugin config

dontlaugh · 2024-05-28T17:57:27Z

If - during the build - we can identify hyperlinks (whether relative or external), it would be useful to toss them into a url table. This can be used to drive an automated link-check test. Refs Raku/doc#4476

That's just one use case. There are many. I'd like to start re-architecting the build to pivot around a normalized sqlite database. Here is a diagram.

If we pull this off, the advantage will be that we can decouple each of the downstream processes that depend on the parsed data (A, B, C, D in the diagram). SQLite supports multiple processes reading simultaneously, so we can run them in parallel to speed things up significantly. I also think it will be easier for contributors (but I could be wrong, who knows).

The challenge is that the "DB Creation script", the build process entry point, becomes more complex. Effectively this component is a compiler from rakudoc to SQL statements.

rakudoc -> intermediate representations (IR) -> SQL statements

In principle, if we capture and normalize all the rakudoc source material, we can drive any downstream use case: static websites, man pages, tests, offline search.

finanalyst · 2024-05-29T13:46:56Z

@dontlaugh This is a very significant change. Honestly, I haven't thought of the build process in this way, and it will take me a while to think through how to do this. I think your rakudoc -> internal rep -> SQL normalisation is rather more difficult underneath than it seems.
Currently, I am implementing a renderer for RakuDoc v2. It should be noted that RakuDoc v1 was never properly implemented.
The new renderer works directly with the AST of the source files, and not with a compiled variable $=pod. The practical result is that the AST is produced about 6x faster than the compiled $=pod. In addition, we may be able to eliminate the 'caching' step, which can take about five minutes.
However, the AST representation of each source file could possibly be the sort of internal representation you are looking for.

dontlaugh · 2024-05-29T16:57:27Z

the AST representation of each source file could possibly be the sort of internal representation you are looking for

It sounds like it. I imagine that the complex program on the left hand side would be best implemented by a library that works with a proper AST.

The data we need in normalized form will look different than an in-memory AST, but as long as each AST node can be serialized as text or bytes, we can store both in different database tables.

The new renderer works directly with the AST of the source files

I am pleased to hear that. What library is parsing the rakudoc into an AST? Can you link to it? Even if it is in nascent stages.

This is a very significant change.

I agree. This is in the high-effort, but (potentially) high-reward category.

finanalyst · 2024-05-30T07:41:14Z

@dontlaugh the new Rakudo compiler creates AST for all programs and the AST can be manipulated. Although the Rakudo AST compiler has not yet completely landed - this will be the raku.e milestone - there is sufficient for RakuDoc, and that has been backported into raku.d.

As an example, if you take a recent version of raku, eg 2024.04, and run the following in a terminal assuming the current directory is a local clone of Raku/docs/doc/Language/, you will get the AST of the file

raku -e 'say  "101-basics.rakudoc".IO.slurp.AST'

The new bit is the .AST method which returns the AST of the input string.

dontlaugh · 2024-07-01T16:32:38Z

Saw this in Rakudo Weekly: The Graph package may prove useful here https://raku.land/zef:antononcube/Graph

finanalyst changed the title ~~Generate sqlite output from pod6 docs input~~ Generate sqlite output from RakuDoc source input Mar 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate sqlite output from RakuDoc source input #359

Generate sqlite output from RakuDoc source input #359

dontlaugh commented Mar 25, 2024 •

edited

Loading

finanalyst commented Mar 25, 2024

patrickbkr commented Mar 26, 2024

finanalyst commented Mar 26, 2024

dontlaugh commented Mar 26, 2024

dontlaugh commented May 28, 2024

finanalyst commented May 29, 2024 •

edited

Loading

dontlaugh commented May 29, 2024

finanalyst commented May 30, 2024

dontlaugh commented Jul 1, 2024

Generate sqlite output from RakuDoc source input #359

Generate sqlite output from RakuDoc source input #359

Comments

dontlaugh commented Mar 25, 2024 • edited Loading

finanalyst commented Mar 25, 2024

patrickbkr commented Mar 26, 2024

finanalyst commented Mar 26, 2024

dontlaugh commented Mar 26, 2024

dontlaugh commented May 28, 2024

finanalyst commented May 29, 2024 • edited Loading

dontlaugh commented May 29, 2024

finanalyst commented May 30, 2024

dontlaugh commented Jul 1, 2024

dontlaugh commented Mar 25, 2024 •

edited

Loading

finanalyst commented May 29, 2024 •

edited

Loading