Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Discussion options

I know there are some advantages of storing a generated version of the PDF:s in the repo, e.g. there is always a version that can be pointed to from the website.

But just having been through the hazzle of rebasing the dev-man branch onto a recent chunk of text changes, and been forced to handle a conflict for each and every commit (and probably also forcing a lot of that onto whoever wants to pull now), I think we need to re-think this. (I had no other problems...)

I propose that we figure out a way to upload the generated files to repo releases or somewhere else (website), so that we can get rid of the generated binary files from the repo to streamline the editing of the two branches.

I quickly looked for a git option to always ignore conflicts for some files but didn't find much. Possibly we could use a (custom?) merge-driver. Never heard about it before so that's why I added the "research" tag...

You must be logged in to vote

Replies: 10 comments · 20 replies

Comment options

thoni56
Apr 5, 2021
Maintainer Author

After some googling I found https://gist.github.com/tmaybe/4c9d94712711229cd506 which explains merge driver a bit, but also indicates how to use it for exactly this purpose. So I'm gonna try this out on the dev-man branch and see what I can learn from it.

You must be logged in to vote
0 replies
Comment options

I propose that we figure out a way to upload the generated files to repo releases or somewhere else (website), so that we can get rid of the generated binary files from the repo to streamline the editing of the two branches.

We could keep the generated PDFs only in master branch. I remember that the reason we were also updating them in the dev branches was to check that everything was working fine, but maybe now we no longer need that.

I quickly looked for a git option to always ignore conflicts for some files but didn't find much.

I think that git rerere (reuse recorded resolution) might do the trick:

You must be logged in to vote
0 replies
Comment options

thoni56
Apr 6, 2021
Maintainer Author

Thanks for the ideas. Yeah, we could do away with the PDF in the dev-branch.

The "branching" model we decided upon was, as least how I see it, to maintain two version of the documentation

  • master that is aligned with the current official release but that would be updated with better and better content continously
  • dev-man that is aligned with the development snapshots, as new features are added this version of the documentation would also add it

We would continuously rebase dev-man onto master to ensure that at release a smooth merge of it onto master could be done. Thanks to asciidoc that is practically feasible. And working.

(Maybe that needs to go somewhere, for posterity, and maybe myself ,-), the wiki?)

Removing the PDF from one, or both, branches would help. But that leaves us with no "dev documents" to point to.

There is a CI-job, which currently only does asciidoc validation as far as I can see. Would it be a possible route to expand on that to build the documentation too? Maybe that was our intention all along. Setting up the toolchain in Travis would be a hazzel, esp. with asciidoctor-fopub, but it would be interesting to try.

Then we could possibly upload the result from the branches to two different "releases", like "Official" and "Snapshot". Then those would be stable places to point to. And would contain corresponding, and updated, versions of the various documents. (Ok, only the manual would actually need the "Snapshot" version...)

I think I tried something like this (uploading to github) in some other project but did not finish it, probably because of lack of time. I would be interested to dig into this.

Maybe a new issue for it?

You must be logged in to vote
0 replies
Comment options

ALAN Manual: Dev Snapshots

The "branching" model we decided upon was, as least how I see it, to maintain two version of the documentation

  • master that is aligned with the current official release but that would be updated with better and better content continously
  • dev-man that is aligned with the development snapshots, as new features are added this version of the documentation would also add it

Exactly. The point is that not every commit to dev-man would require rebuilding the docs; only significant changes to the documentation demand a rebuild, i.e. in order to grant developers access to bleeding edge features, or just benefit from updated contents. This judgement being arbitrary in nature, we didn't so far attempt to automate the process but just handled it by hand.

Removing the PDF from one, or both, branches would help. But that leaves us with no "dev documents" to point to.

The PDF is really only required in master branch, the dev docs could be just in HTML, after all they are just a pre-view courtesy, so one format should suffice (also, the HTML is fully standalone, so it can be downloaded directly and consulted locally).

ALAN Manual: Dev Strategies

We would continuously rebase dev-man onto master to ensure that at release a smooth merge of it onto master could be done. Thanks to asciidoc that is practically feasible. And working.

After the first conflicts problems, we've been careful to ensure that the dev branches are always rebaseable unto master, and we also did rebase dev-man whenever master was changed. So, it should be fine if we keep going this way.

Workflow Notes

(Maybe that needs to go somewhere, for posterity, and maybe myself ,-), the wiki?)

We did jot down some notes in CONTRIBUTING.md, but maybe the Wiki could be a better place for a full blown article on the strategies for specific documents maintenance. It seems that we often loose track of our knowledgebase contents (e.g. the Wiki pages on the problems with ALAN Italian have been there for years); as a general rule, every problem we faced and solved was documented somehow and somewhere, the problem is remembering where!

Since we've been using diligently Issue labels, milestones and Dashboard Projects, it should be fairly easy to find out any topic via filtered searches. But this only covers Issue, not in-repo documents or Wiki pages...

Indexing the Existing Knowledgebase

We should strengthen the Wiki of each repository, creating some sort of smart index that allows us to retrieve all these guidelines and notes. The Wiki of the alan-if/alan repository might deserve more attention in this respect, and could become the main "official Wiki", making it easier to use it a directory to find all knowledgebase articles (or links to Issues), otherwise this knowledge will be scattered across many repositories (and their Wikis). The only problem here is redundancy — i.e. some articles naturally belong to the Wiki of their repository, but having to keep an Index in every Wiki of the alan-if org might become cumbersome, unless we can come up with some sort of databse solution than can auto-generate and update all these Indexes.

It might seem overkill, but the problem here is that our projects are actually over documented (whereas usually the opposite is true). Writing guidelines and tutorials take time, so it's a pity if these articles are not brought to fruition due to lack of visibility.

CI Automation

There is a CI-job, which currently only does asciidoc validation as far as I can see.

It only validates code styles consistency via EditorConfig, actually.

Would it be a possible route to expand on that to build the documentation too? Maybe that was our intention all along. Setting up the toolchain in Travis would be a hazzel, esp. with asciidoctor-fopub, but it would be interesting to try.

I remember the problem being the asciidoctor-fopub part, which only works on Windows (see #66) due to Win-specific paths in our custom configuration (for fonts or styles, don't remember). The fopub configurations are not very friendly, and quite hostile toward collaborative editing and version control. Probably the best solution would be to check Asciidoctor's native PDF backend, which has been updated a lot in the meantime and probably solved all the issues that were preventing us from using it here (problems with footnotes inside tables, and a few other missing features, see: #9)

ALAN Manual Dev Snapshots: Release Strategy

Then we could possibly upload the result from the branches to two different "releases", like "Official" and "Snapshot". Then those would be stable places to point to. And would contain corresponding, and updated, versions of the various documents. (Ok, only the manual would actually need the "Snapshot" version...)

To achieve this we'd need to define a snapshot release strategy.

Sometime only one of the two format needs updating (e.g. because we improved the CSS of the HTML doc, or the template of the PDF); other times there might be just typo fixes, we don't necessarily call for an updated version.

Possible solutions could be:

  1. Regular updated every so often — e.g. every three or six months.
  2. Update the snapshot docs whenever a new Alpha version of ALAN is released.
  3. Others?

The problem with (1) is that end users might have to wait to gain access to new juicy stuff; whereas with (2) the problem is ensuring that we update the contents before releasing the new Alpha, since the build jobs would be automated via some CI cross-repo communications, base on release tags or branch merges.

I think I tried something like this (uploading to github) in some other project but did not finish it, probably because of lack of time. I would be interested to dig into this.

Personally, I'm for the manual approach, after all we are always aware when new juicy contents are added to the Manual, so each one of us is free to update the HTML snapshot (and even the PDF) based on whether he thinks it's worth sharing it with bleeding-edge authors — it just boils down to deciding whether to add the built docs to the commit stage or not, so no big deal there. Since the snapshot preview links are pointing to the dev-man branch, and docs name don't change, these links will always show the latest document that was pushed.

Maybe a new issue for it?

We've had a long discussion on this in #6 (now closed), we could reuse some of the text material from that thread if it saves us some typing.

About PDF Rebase Conflicts

Back to the original question of this Issue, which type of conflicts does rebasing on master bring about?

What I usually do when these conflicts come up (inevitably for both HTML and PDF builds, due to date changes or just Asciidoctor template changes) is to simply solve the conflict with either "our" or "theirs" and then rebuild both docs on the fly and add them to the stage before committing the rebase — bear in mind that when rebasing or merging we should always rebuilt the docs from scratch (if we include them) because of the way dates and version numbers are handled, and because Asciidoctor updates often introduce CSS changes.

So the best strategy is to always manually rebuild the docs.

This can be probably achieved by some Git hooks/filters (when carrying out specific operations on dev-man, for example).

You must be logged in to vote
0 replies
Comment options

thoni56
Apr 6, 2021
Maintainer Author

Thanks for digging up #6, and some good thoughts on information handling, although the concrete approach for that is still up for discovery, decision and implementation ;-)

ALAN Manual: Dev Snapshots

The "branching" model we decided upon was, as least how I see it, to maintain two version of the documentation

  • master that is aligned with the current official release but that would be updated with better and better content continously
  • dev-man that is aligned with the development snapshots, as new features are added this version of the documentation would also add it

Exactly. The point is that not every commit to dev-man would require rebuilding the docs; only significant changes to the documentation demand a rebuild, i.e. in order to grant developers access to bleeding edge features, or just benefit from updated contents. This judgement being arbitrary in nature, we didn't so far attempt to automate the process but just handled it by hand.

Good. But I feel we have slightly differing ideas about how to manage the actual "results" of the branches. I read your comments as thinking around releases. I also think "releases" are important, but I also strive for "continuous deployment" to lessen the cognitive load on us to remember to create new "builds" only for "releases", and deliver the value of the change as soon as possible.

E.g. before extracting the manual to this alan-doc project, each development snapshot of Alan also contained a, mostly updated, version of the manual, consistent with the development snapshots functionality. In that model the "stable" documentation was actually that, stable. No improvements could be done in it until the next official release. We have now flipped this, and we can continuously improve the "stable" version. It would be A Good Thing™ (but not strictly required) if that change would immediately benefit readers/users.

...

ALAN Manual Dev Snapshots: Release Strategy

Then we could possibly upload the result from the branches to two different "releases", like "Official" and "Snapshot". Then those would be stable places to point to. And would contain corresponding, and updated, versions of the various documents. (Ok, only the manual would actually need the "Snapshot" version...)

To achieve this we'd need to define a snapshot release strategy.

Sometime only one of the two format needs updating (e.g. because we improved the CSS of the HTML doc, or the template of the PDF); other times there might be just typo fixes, we don't necessarily call for an updated version.

Possible solutions could be:

  1. Regular updated every so often — e.g. every three or six months.
  2. Update the snapshot docs whenever a new Alpha version of ALAN is released.
  3. Others?

As already indicated I'd suggest "generate new documentation on every commit".

To me, the release process is different from the continuous builds and deployment. The important thing is that the information in a document generated from master should always match functionality in the latest official release of Alan, and be marked with that version. Any kind of compatible improvement should go in master directly. Changes in functionality should result in a change in dev-man.

A release of Alan will thus also render some additional, manual, work when it comes to alan-doc, especially the manual. This work would then be

  • merge dev-man into master
  • on master: update release marking to the new release
  • on dev-man: update release marking to the next release, still with "development snapshot" label
  • done.

Just stating this, to ensure we are on the same page here, but I'm confident we are. (I'm ignoring the actual build here, since that is implied, being manual or not. I'm also ignoring the actual merge-branch of dev-man, as that is not important for the discussion.)

So when it comes to releases, to me, only the update of the documentation for the new functionality is important, Any other types of changes are inconsequential when it comes to releases and can be done, and published, at any time. An improvement should not need to wait for the next release.

But again, I think we think about this in slightly different ways. So let me put it this way:

What would be the worst thing that could happen if we build a new set of fully usable documentation on every commit?

It's not like there are API incompatibilities that breaks things, as for a software release. We also use SemVer-like semantics for the two branches, as does the Alan SDK with the official releases and the development snapshots, so the correlation between them are clear.

To be very clear, I'm not forcing the issue. Instead I think it is interesting that it seems that we have differing reasoning here, and interested in learning more about your viewpoint.

At the end of writing all this I realise that my concern is primarily the content, and even more so, specifically the manual. But you have worked hard, even struggling, with the toolchain, layout and such things, and also for the other documents. A random change in a tool configuration or version might actually trash things, warranting a "proof-reading" before actually releasing. Is this your concern? If so, I think I can feel where you are coming from...

You must be logged in to vote
0 replies
Comment options

Good. But I feel we have slightly differing ideas about how to manage the actual "results" of the branches. I read your comments as thinking around releases. I also think "releases" are important, but I also strive for "continuous deployment" to lessen the cognitive load on us to remember to create new "builds" only for "releases", and deliver the value of the change as soon as possible.

E.g. before extracting the manual to this alan-doc project, each development snapshot of Alan also contained a, mostly updated, version of the manual, consistent with the development snapshots functionality.

As far as I can remember, the reason we didn't manage to come up with a solution was because of a number of unsolved questions that were preventing a CI solution:

  1. asciidoctor-fopub preventing a cross platform build.
  2. Unsolved discussion regarding the Manual last changed date attribute.
  3. ALAN code examples and transcripts being dynamically generated to match the ALAN version for which the Manual is being built — i.e. ensuring that any CI builds are using the correct alan and ARun versions, according to branch.

Also, so far we had only a single Manual release on master, which happened when the preview release of the StdLib came out, and it was very close to the Christmas Holidays, so there hadn't been much follow up on that.

But let's recapitulate the problematic points of the above list...

asciidoctor-fopub Problems

Unless we can find a solution to the fopub problem, it's going to be hard to come with any CI solution. The current problem prevents using the configuration files on both Windows and Linux, and we're now using Windows for the builds.

Possible solutions:

  1. Use a script to modify these configuration files so they work on Linux and the CI virtual machine (e.g. using SED).
  2. Create a Linux version of these settings in the repository (with a different extensions), and have a script replace the Windows files by the CI initialization job.

Alternatively, switch to Asciidoctor's native PDF backend (if it now support all the needed features) — but this would require some extra work before it becomes usable in CI production:

  1. Create an ALAN syntax and theme for the Rouge (Ruby) syntax highlighter, which is needed for this backend.
  2. Find a way to use our custom syntax with Rouge, because we might need to benefit from any changes in real time (i.e. can't way for a PR to be merged into the upstream project). I have no idea if this can be done, or how.

Manual Date

I remember you proposed that the date attribute should be set by the build script, at conversion time, to avoid having to manually edit it in the source file each time. While this make sense, there are also some undesired side effects to this:

It would mean that if the docs are rebuilt at every commit, the date would also change in the docs, regardless of whether contents have changed.

Especially for the PDF edition (which is usually intended for download, whereas the HTML doc more for online consultation), because end users might end up download it again even if no real changes occurred (which translates to downloading and replacing their local copy, wherever that is stored).

IMO the last updated date should indicate when contents were last modified, and not change when the template was tweaked, or other non-meaningful cosmetic changes took place.

Dynamic Examples

Although right now the Manual doesn't use these (other docs here do!), in the nearby future we'll be adopting this approach always more, since it proved successful for the StdLib Manual (see AnssiR66/AlanStdLib#82).

The idea is that ALAN code snippets in the docs should be extracted directly from a real source file (via include::) and that their output should be extracted from a real transcript generated by compiling the sample adventures with the matching alan and arun binaries which the document is being written for.

This would ensure that:

  1. All code examples are valid and compilable.
  2. All output sample match the real output generated by the current ALAN version.

This reduces the maintenance work of the examples, by allowing us to "set them and forget them", and have the toolchain automatically produce the correct results.

But it also mean that we must ensure that we're using the correct ALAN binaries, both locally and on the CI server, which introduces the problem of having to use the latest Beta on master, and the latest Alpha on dev-man.


As already indicated I'd suggest "generate new documentation on every commit".

You mean on every commit the docs should be rebuilt and commite to the dev-man branch, even if the commit doesn't alter the Manual and its assets?
This could quickly lead to a huge bloat in the repository size, especially if the build script injects the date, since it would mean that at every commit the generated docs will differ at least in the date value.

A release of Alan will thus also render some additional, manual, work when it comes to alan-doc, especially the manual. This work would then be

  • merge dev-man into master
  • on master: update release marking to the new release
  • on dev-man: update release marking to the next release, still with "development snapshot" label
  • done.

Just stating this, to ensure we are on the same page here, but I'm confident we are.

I'm 100% with you on this, and I'm also a great fan of all things "auto-magic". It's just that I think that there are still some major problems with the Asciidoctor build that need to be addressed before we can setup a CI toolchain of this type.

Also, I believe you use Circle CI, which I don't know anything about (I use only Travis CI). In the meantime, GitHub Actions have also entered the scene, which seem an interesting way to handle CI tasks, especially with the Marketplace offering ready-made solution which are maintained by the creators of these GH Actions — especially when the actions are build by the creators of the tools which are involved.

What would be the worst thing that could happen if we build a new set of fully usable documentation on every commit?

  1. Huge size bloat of the repository, especially if you inject the date attribute via the build script, because then we could never have identical output of the HTML and PDF files, so they'll always end up in every commit. Also, it seems to me that this would pollute those commits that don't deal with document changes, and would interfere with cherry picking and other interactive Git operations — and of course, we'll never be sure of whether these documents contain real changes or are just the result of this policy.
  2. The whitespace diff bug (that has been afflicting the StdLib repo for years). We also have tests for the source adventures in this repo. Just imagine if one of these sources would stumble in whitespace creeping in at every run: a CI job might trigged and endless series of commits, rebuilding the docs every time, probably until the CI VM crashes or you hit the monthly CI free-minutes limit.
  3. I'm not sure this would be a good service for end users, who might expect new versions of these documents to contain real changes, especially if they are downloading them.

To be very clear, I'm not forcing the issue. Instead I think it is interesting that it seems that we have differing reasoning here, and interested in learning more about your viewpoint.

I don't think we have different visions on this, is just that each of us is focusing on different problems that are preventing this to happen. Whereas you're more focused on how to interconnect this to the ALAN release cycle, I'm more focused on the current unsolved problems of the Asciidoctor toolchain (which prevent any reliable CI build, right now).

Bear in mind that I've been spending quite some time trying to find solutions to a number of Asciidoctor toolchain related problems, and how to come up with a good solution that would work across different repositories that use Asciidoctor for ALAN — so I tend to be more aware of how far these solutions are.

Just to mention briefly one problem: ISO-8859-1 validation!

ECLint simply fails to validate ISO-8859-1 files, raising false positive for valid files. There doesn't seem to be a bullet-proof way to validate files for ISO-8859-1 encoding, you can mostly proof they are not UTF-8, or that they are single-char encoded. Yet we need assurances that our sources are valid ISO files, especially since modern editors tend to break ISO encoding with almost any paste operations.

The problem even gets worst when we don't have some ALAN specific file extensions for ALAN related files (e.g. transcripts and solutions), because extensions like .log are usually associated to UTF-8 in most editors — hence my proposal to official adopt .a3sol and .a3log, which I'm using in most projects anyhow. But the .i extension is also at risk of being corrupted by most editors, since it's a generic extension used by many languages for include files. Sine neither Git nor GitHub offer much support for these legacy encoding, we really need a safe and trusted way to ensure they are correctly encoded.

A random change in a tool configuration or version might actually trash things, warranting a "proof-reading" before actually releasing. Is this your concern?

Not really, I mean ... the contents are usually well polished whenever we commit them, and the master branch should only contain finished work for a specific ALAN release. I'm more worried about the fact that there are so many different tools and standards involved that we need to make sure that every piece of the puzzle is solidly constructed, before handing all these to some automatic robot muncher.

If you had been struggling with the "spurious whitespace bug" that has afflicted the StdLib and Alan Italian repos (it suddenly disappeared in the latter, for unknown reasons) you would know how frustrating it can be to work with Git and ALAN sources when things go wrong — at every run the transcripts change, even if nothing was changed, so there's at play some complex interaction between the ISO encoding and Git's lack of support for it here, possibly due to a small bug that spits out a char sequence which to Git is broken UTF-8. The problem is that these changes show up in Git's work space, and these are a nightmare on any CI job, since they prevent many Git operations.

You must be logged in to vote
0 replies
Comment options

Publishing Dev Snapshots on an Orphan Branch

I've been giving some thought to the whole problem of how and where to "publish" the dev snapshots of the ALAN Manual (both PDF and HTML).

I think that committing them to the dev-man branch is only going to give us problems, both in terms of conflicts as well as in terms of size bloat.

A possible solution would be to tweak the build scripts so that they are branch aware (via a simple git query) and when it's running on dev-man it should output the PDF/HTM with a different name (or path) which is ignored in the repository. Then the script could commit the new documents to a new and separate branch, especially created for documents dev snapshots previews.

We could create this special branch as an orphan branch, which only stores snapshot previews documents and doesn't share any history with the main repo, so no possible conflicts could come from it, but we'd still be able to offer live preview links of the latest dev docs to end users.

This approach should lend itself well to the various CI tasks, and then we'd only have to focus on auto-rebuild the Manual on master branch, whenever a new ALAN release is out — which, I believe is you main concern here, i.e. being able to synchronize automation between new ALAN releases and publishing the latest ALAN Manual on master.

Also, being an orphan branch, we could simply force commit at each documents build, effectively resetting the branch at every new snapshot, since we won't really be needing a commit history there; which means that its size would never bloat, even if we rebuild them at every single commit on dev-man — the only concern here would be that we might run out of the monthly free-minutes of the CI server (or GitHub Actions), which would mean that the CI would stop working until the next month, unless there are funds for more operations/minutes.

In any case, I think it's important that dev-snapshots of the Manual should be build with different names from those on master, to avoid all these annoying conflicts and keep them clearly separated (we could even just add some prefix or suffix to their names, e.g. dev_manual.pdf/.html)

Does it make sense to you?

You must be logged in to vote
5 replies
@tajmone
Comment options

The more I think about the orphan branch solution, the more I'm convinced it's the right solution to this problem.

Now GitHub Pages let's you choose which branch should be the source for the website (before it used to have a fixed name and was an orphan branch by default), so all documents (both Beta and Alpha) would be accessible online via the GHPages website (which means that syntax highlighting would work on the ALAN Manual).

We could also add an Index page (either as a static website or using GitHub's native Jekyll) providing end users with an home page with links to every available document, both stable and dev-snapshot, all in the same branch (just different names, by added prefixes).

The CI script could simply amend the previous commit at each build, since we don't really need a repo history there, which would mean that the newly generated documents will simply replace the older one, and no files are lost from the previous commit — i.e. the whole branch would end up always having a single commit which keeps being amended. Of course, we could still intervene manually (e.g. to update the index page) and this would not affect the CI, it would just amend the latest commit (regardless of whether we actually created a new commit or amended the previous one).

The whole branch would only contain the index.html page (a brief intro, links to the repo and the available docs, stable and dev) and the various documents (PDF and HTML), nothing else.

@thoni56
Comment options

thoni56 Apr 7, 2021
Maintainer Author

I also think this looks promising.

@thoni56
Comment options

thoni56 Apr 7, 2021
Maintainer Author

A first step to trying this out might be to amend the build scripts, or add new ones, that does the "publishing" to GH Pages? Then we could try it out with manual generation/deploy and get rid of the generated files in the repo if it works.

@thoni56
Comment options

thoni56 Apr 7, 2021
Maintainer Author

Actually the very first steps would be to

  • Decide on the GH Pages branch (published? public?)
  • Create the empty, orphan branch (git switch --orphan <branch> seems to create it completely empty)
  • Decide on a storing structure for the documents (a subdirectory per document + 1 one for the snapshot version of the manual, perhaps /alanguide, /manual, /manual-preview to start with?)
  • Hack a new script that publishes the manual.pdf to /manual
  • Amend that script so it only does it from master

When that works we can

  • figure out how to make a branch-aware scripted editing so that the preview manual is clearly distinguishable from the official. And then script the publishing of that into <publish_branch>/manual-preview.

Once that works we can

  • see if it's possible to put that into a CI flow somewhere (given that we know about all the problems in setting that up).

Does that seem right?

I'm looking forward to the birth of https://alan-if.github.io/alan-docs!

@tajmone
Comment options

I also think this looks promising.

Yes, we remove the problem completely this way. The CI becomes completely autonomous and can't "interfere" with the repo, not even by accident (i.e. if things go wrong, due to changes in services, etc.).

A first step to trying this out might be to amend the build scripts, or add new ones, that does the "publishing" to GH Pages?

Sure, we can always set GHPages to point to the branch later, when we're ready for it.

Then we could try it out with manual generation/deploy and get rid of the generated files in the repo if it works.

Yes. And this way we'll be also free to do manual interventions on that branch, if we ever need to.

Decide on a storing structure for the documents (a subdirectory per document + 1 one for the snapshot version of the manual, perhaps /alanguide, /manual, /manual-preview to start with?)

In this case, we should opt for a static HTML website, rather than Jekyll, so we have more freedom regarding the structure (also, we don't really need Jekyll, just a simple page linking to the docs).

Does that seem right?

It looks fine to me, we only need to start playing around with it and see how it unrolls.

I'm looking forward to the birth of https://alan-if.github.io/alan-docs!

Me too. And I think it's going to be really great.

Comment options

thoni56
Apr 7, 2021
Maintainer Author

Here's a quick attempt for some starting points for the discussion about the "release process" and other related matters in the Alan documentation project, but they are primarily from the perspective of releases of the Alan SDK and The Manual.

I decided to do this in a more structured manner so that we can pick one of the items and, thanks to the threading of discussions, focus on one item in one thread. Hopefully much better than our previous comment tennis.

Principles

  1. Don't store generated files in the repository - it is never a good idea to store generated files in the repo, I know we do it now, and why (because we haven't figured out where to put them...)
  2. Generated output should be possible to identify - it should be possible to identify from which source (release/commit/tag) it was created and what the content describes
  3. A reader of a document is always interested in getting the best version possible as long as it still describes the same "thing" - this introduces the notion of "compatible" and "incompatible" changes in a document which is much what master and dev-man is trying to handle when it comes to the two tracks of The Manual (and possibly other documents that contains descriptions of features in the Alan SDK)

Implications

1. Don't store generated files in the repository

A. Firstly this means that we need to generate the output at some point. Automatically or manual, with a preference for the former ;-) The natural suggestion would be on each commit. (Practical problems will be discussed later ;-)

B. We need to figure out where to store the generated output. I initially thought we could upload them somewhere, The Home Page maybe, I experimented with Github Releases (don't know why that was so hard...), but that would only solve the problem of discrete files, like the PDF, not the HTML file tree. Orphan branches, a.k.a GH Pages, sure looks like one solution we can try. (I don't understand why you would even need a branch for this, but I suppose it's just the way Github decided to do it at one time...)

C. We need to decide on which "releases" of which documents and in which formats to store, and how to separate them. As usual, I mostly think about The Manual. And from that viewpoint I truly believe that we only need two simultaneous versions, the currently best version matching the latest official Alan SDK release, and the currently best version of the manual matching the development snapshots for the upcoming release of the SDK. This means that we should allow improvements in both those two branches and their respective generated output, possibly (preferably?) overwriting the previous.

D. At some point, probably at the release of a new SDK, we also need to "archive" a version of the manual for the just outdated official release.

2. Generated output should be possible to identify

Again from the perspective of the manual, there are two different identifications that are relevant for a document.

A. The version of the "thing" it describes. For the manual this is fairly obvious, namely the release of the Alan SDK. The same goes for documentation of the standard library. It is more interesting to think about what that means for The Guide and other such documents. Maybe the Alan SDK is the primary reference here, but in these cases is it the first release that supports what's in the guide?

B. The version of the document itself. The date of the generation or the commit would probably suffice. The date for the latest of the "content bearing" files would be another alternative. It would depend on your view of both what are meaningful changes to a document, and does a reader actually care about that change, but also would a reader use that information for anything? I personally think that the build date (or commit) would be good enough here.

3. A reader of a document is always interested in getting the best version possible as long as it still describes the same "thing"

A. The fundamental implication of this is that users prefer an updated online version to downloading.

B. There need not be a "stable" version of a document, it can always be updated.

C. A user is not interested to get back to an earlier version since the one she is looking at is the best ever ;-) (This might not be an implication but an assumption that could be discussed if we find a use case that contradicts it.)

D. The best version of a document is the latest and the best version for a previous version of the SDK is the last version before introducing descriptions of new features. This implies that archiving for the previous release can be done at release of the new SDK (or what ever the thing it describes).

E. Pointers to documentation (e.g. on The Home Pages) should always go to generated output at the same place/url.

Problems

I suggest that we discuss problems with, and suggestions for implementation for, problematic points above in separate threads under a suitable heading.

You must be logged in to vote
2 replies
@thoni56
Comment options

thoni56 Apr 7, 2021
Maintainer Author

1.A Don't store generated output in repo, generate it instead

We already have scripts for generating the documents. There are a number of known problems moving those to (probably) any CI environment. Running them manually is probably sufficient for the immediate future if we can amend them with also storing the result somewhere.

Once storing is in place, we can continue our effort to automate everything in a CI-flow.

@thoni56
Comment options

thoni56 Apr 7, 2021
Maintainer Author

2.A/B Generated output should be possible to identify (compatibility and document version)

(I thought discussions would allow us to build a complete tree of commenting, like Usenet, so we could have different branches in the discussion for each of my points. But that's obviously not how it works...)

We need to markup the development snapshot/preview version of the generated document so that it is obvious that it is not describing the current release.

We will also regularly merge the dev-man branch to master which means that we can't have a source file configuration, like a variable or something. It must be scripted and modified depending on the branch name.

The scripting is easy, at least on Linux and MacOS, you can easily exchange some marker with something, to generate a non-commited file. I've done this in a number of projects (on Linux). You just have a template file which is always generated but with different content depending on (in this case) branch, and that is then used in the build process.

What I don't know is what file and what content could we do this for to get the desired effect.

Comment options

OK, I'm commenting here in the root of the thread, just to cover the overall changes to the system, and make sure I'm getting it all right — we can later on extrapolate the main point in a more concise manner and add the to the sub-threads, to keep the conversation cleaner.

(I thought discussions would allow us to build a complete tree of commenting, like Usenet, so we could have different branches in the discussion for each of my points. But that's obviously not how it works...)

Unfortunately, subsequent posts from a same author are being joined together, that's the current limitation. I should post on GitHub Community and ask for a new option to allow splitting or joining such follow up comments.

Now, regarding the new system ... It will introduce some subtle but significant changes, which in some respect change everything in the current workflow.

I agree with all that you wrote, and will now share some extra thoughts, details and perplexities.

ALAN Manual Releases Scheme

(let's focus only on the ALAN Manual right now, since it's the centre publication of the project, and because other docs are not updated so often, or are not available in all formats, or are not SDK dependent. Also, if it works for the Manual, it will work for them too.)

In the old workflow, it was up to us to decide when to commit (and therefore publish) an update version of the Manual. With the new system, any commit will update the documents on the new branch which serves them to end users.

And this is fine. I mean, there's really no point of tracking a Manual release, we only need (as you said) to provide two versions: one for the Beta SDK (which currently means stable), and one for the Alpha SDK (which currently means whatever latest Alpha SDK is available).

If every commit results in these two documents being updated, it also means that we've solved the original problem of developers having access to a commonly shared build document (PDF or HTML) which they can use to check problems and point them out — they'll simply be available online, via GitHubPages (at some point), and/or via the dedicated branch anyway.

Version Number/Release Date

As for the release versioning scheme, it seems we don't need one after all, the version of each document is the ALAN SDK to which it refers.

The build script should just define the date/timestamp attribute which the AsciiDoc source will incorporate into the final document — and thanks to Asciidoctor conditional evaluations, we'll be able to add any required text, e.g. for the Alpha version being a developer snapshot, or provide the correct SDK download links, etc. So that's not a problem, we are free to handle it whichever way we like.

So, as far the version info goes, each document (PDF and HTML) should mention the ALAN SDK and the last updated timestamp, which corresponds to the CI's time of execution. There's no risk of an older version replacing a new one here, since it's all automated. As for end users' consumption, let's assume that whenever they decide to work on a new adventure they'll just re-download the latest documents (Beta or Alpha), and they shall be good with them for quite a while.

Nothing prevents us from having different build scripts in the repository, e.g. one used by CI, one used by developers to quickly produce a local preview — indeed, this might be a desirable feature, since when building a local preview one wishes for the HTML doc not to embed images (so it will refresh them if they are tweaked), and want a single-document output (whereas in the future we might probably also build a chunked HTML online version, for consultation, at least for the Beta Manual).

Alan repo vs Alan-Docs

Here I'm not sure I've understood what the CI will be doing...

Historically, the ALAN repository also included the Manual sources (as ODT) and will have to automate creating the new PDF and including it in the various packages, at every new release. Now that the Manual (and other docs) have been moved to the ALAN Docs repository, things have changed.

I understand that whenever a new ALAN Beta is released, there's the need to merge dev-man into master, so that all the new features become available in the Beta manual. And, if I've understood correctly, you'd like to automate this part via some CI running on the alan repository, is this correct?

Since the build script acts on branch-awareness, merging should be sufficient, and the script will handle all the correct SDK references. All we need to do is ensure that inside the Manual source we never explicitly mention the current SDK, but use an attribute instead (e.g. ALAN_SDK), which will be substituted accordingly.

This, of course, has also some implications on the Manual's workflow, for it means that we must ensure that the dev-man contents are always well polished (i.e. no incomplete contents should be on dev-man, instead multi-step editing should be done in a dev sub-branch, and merged only when production ready).

What I don't understand, is whether you're also planning to auto-include the PDF and/or HTML Manual in the distribution packages — again, this shouldn't be a problem, since the CI can simply build them on Alan-Docs or extract them from the (updated) publishing branch. In this case, again, version number shouldn't be a problem: the latest Beta will contain the latest Beta Manual from the updated master branch (after the merge), and Alpha releases will (if they do need to) the latest versions from dev-man. As mentioned in the "Branch-Aware Scripts" section below, in the final build all version info and ALAN SDK tooling should be automatically handled, so the correct SDK info will be printed in each Manual, and the date will be that of time the CI script was executed.

So, basically, we have the CI jobs on alan-docs that will ensure that with each commit the public available Beta and Alpha Manuals are updated, and then the alan repository CI that will handle the transition from dev-man merging into master when a new Beta is out. Correct?

Do you foresee some complications in this process? or some implications that might restrict the workflow on ALAN Docs?


Some considerations on the finer details being discussed...

Branch-Aware Scripts

The introduction of branch-awareness in the build scripts will not only solve the CI problem, but also the afore-mentioned problem of dynamic code examples and transcripts requiring the correct ALAN SDK at build time.

Since these dynamic examples are most likely going to be used more in the ALAN Manual in the future, let's just summarize how they now work on the StdLib.

In the StdLib repo we have the same problem: master branch should be the docs using the latest Beta SDK, whereas the dev branch often relies on the latest Alpha SDK (either because of new features and bug fixes, or just because the next release will be targeting the next ALAN SDK released).

The way I envision this feature being supported is by having inside the _assets/ folder an alan-sdk/ folder with two subfolders: alan-sdk/beta/ and alan-sdk/alpha/. The build script(s) will then query Git for the current branch, and depending on whether it's master or devel, define a variable pointing to the correct alan and ARun binaries, and use them further on in the script. The repository will not store the actual binaries (ignore them) but will offer some script that checks that the correct binaries are present in the above mentioned folders (by querying for -version) and auto-download them if they mismatch the current configuration (stored somewhere in the repository) — some simple script using cURL and an unpacker to download and extract the ALAN SDK(s).

I believe that the same scripts can be used for the CI, and that they shouldn't pose any problem at all, after all the CI builds also want to ensure that the correct SDK binaries are being used, and that these will also have to be downloaded on the CI virtual machine in order to test and build the documents.

So, this is another problem that should be considered fixed, both for the build toolchain and the CI.

Re-Building at Every Commit

I've understood that your approach is that with every new commit the docs should be rebuild. I'm a bit concerned about the free minutes limitation that was recently introduced by Travis CI (and GitHub Actions), and the possibility that having a full tests plus conversion(s) job might actually exceed these free minutes (I don't remember exactly how many free minutes there are, but they are not too many when dealing with big jobs, and they apply to the whole account).

Also, when it comes to dev-man, it's clear that every commit refers to the ALAN Manual, but on master branch we'll have many commits that are not related to the Manual at all, so I wonder if running the whole conversion and publishing task (unconditionally) is a good idea. Surely, we might want to run the conversion in order to check that the docs are building without error, but I guess that the update process will have to take into account whether the documents have really changed — which, they probably will always do, due to the timestamp injection.

I might exaggerating my worries here, it's just that since Travis introduced the free minutes limitation I started to be a bit paranoid about it. One thing is if you exceed these minutes and just renounce on the EditorConfig validation (the only Travis CI job currently in Alan Docs), another is if you're skipping the update process when a new ALAN Beta release is out!

The asciidoctor-foub toolchain is quite slow and time-consuming, and if we add up all the validation tests and conversions that make take place at every commit, I'm not sure of how many Travis minutes each commit will amount to. There are periods where there isn't much activity on the repo, and other where all the efforts are invested in a single time span, so its hard to tell. The point is that if you build the whole toolchain on a CI system, it's worth considering the consequences that exceeding the free minutes might have on the whole ALAN echo-system (e.g. including the wrong version of the documentation in the generated packages).

Furthermore, whereas with dev-man we clearly have a branch which is dedicated specifically to the ALAN Manual for the Alpha SDK, with the master branch we're not dealing with a Manual specific branch (we don't have a branch specifically for the Alpha SDK Manual) — is that a concern? should we have a branch dedicated to the Beta manual (e.g. dev-man-beta vs dev-man-alpha)? would that improve our workflow somehow?

You must be logged in to vote
10 replies
@tajmone
Comment options

Agreed, the terminology now start to be well formed, so we'll be able to use it in the guidelines documents.

Naming the branch published sound very intuitive! Good choice.

As for the terminology:

  • "document(s)/doc(s)" should be clear enough according to context — if we're speaking of the repository source, it can only refer to source documents (AsciiDoc, Word, ODT, whatever format); if we're speaking of the published branch (or the GHPages website) than it refers to any document published (whether viewable only or downloadable).
  • If the context might lead to ambiguity: "source document(s)/doc(s)" vs "publications"/"published document(s)/doc(s)" should be clear enough.
  • Publications might also be further described as:
    • "online" — i.e. viewable in the browser (HTML, but also PDF, PowerPoint, etc. in most cases).
    • "downloadable" — a Zip archive, an exe eBook, etc.

The confusion was really only here, since we're jumping from one topic to another, but in the final documentation the linearity of the text and the context shouldn't lead to much confusion.

@thoni56
Comment options

thoni56 Apr 11, 2021
Maintainer Author

And our comments are getting shorter and shorter ;-)

The published branch now exists with the latest version of the alpha and beta manuals and a simple first index page. I chose to pick a Jekyll theme (for looks) and editing can be done in Markdown. Peruse at https://alan-if.github.io/alan-docs/!

Any edits welcome!

I also added a quickly thrown together script for doing the publishing in the manual sub-dir: publish.sh. Did the last commit with it, so it seems to be doing what I want.

There was one drawback with an orphan branch. I have a git alias which I use all the time (log --graph --decorate --pretty=oneline --abbrev-commit) and in it the orphan branch seems to be "floating around", like it is looking for a parent ;-)

For a time it looked based on origin/old-master-tajmone then it jumped onto dev-man. So it gets in the way of clean graphs. I can't find anything online about this or how to ignore some branches in git log, so I'll have to live with it.

(I just remembered that you mentioned doing commit --amend on that branch, squashing everything to one commit obviously would help a bit.)

@thoni56
Comment options

thoni56 Apr 11, 2021
Maintainer Author

Hmm, this SO answer contains an interesting description of "orphan branches" in git. They don't exist ;-) Or rather only temporary.

But when I checkout the initial commit on published and do git show HEAD^ it protests wildly and says that is an unknown revision.. So it really looks like it does not have a parent.

But the graph logs insist that it does, strange:

:
* e9265b6 Update Group and Cookbook links
| * c54e841 (origin/published) Publishing new version of beta manual
| * a32158f Make links a bit more obvious
| * 6eb5359 Make header match project name
| * e2c3707 Remove index.html
| * e9e23d4 Add an index.md instead
| * 341ff05 Create a simple index.html with links to the manual (alpha & beta)
| * 2468645 Set theme jekyll-theme-cayman
| * 87fc371 (HEAD) Initial versions of Alan Manual beta and alpha
| * 4f7a7c1 (origin/old-dev-man-tajmone) Clarified the difference between two variants of Schedule
| * 8bab6f4 ALAN Man: Rename Appendices Source Files
| * 0a705c7 Clarify that the width option is always overridden by actual value if possible
| * a797c47 Document block comments
|/
* 705620e (origin/old-master-tajmone) Asciidoctor ISO Inclusions
:

I'll delete my local tracking branch with git branch -d -r origin/published and that should be it. Yep. It'll show up on the next git fetch --all of course, but with this note to self I can probably remember how to get rid of it again.

@tajmone
Comment options

That's strange, I mean in terms of Git handling orphan branches this way. As I mentioned, I did notice many problems in various Git GUIs when dealing with orphan branches. If it becomes too much of a problem, we could always set up another repository only for serving GHPages contents, directly from main branch.

@tajmone
Comment options

Tomorrow I'll look at all the changes and start to catch up.

Comment options

Renaming the dev-man Branch

@thoni56, since the new publishing system is now up and running, I suggest you rename the he dev-man to alan-manual-alpha so it's less ambiguous and it's not mistaken for the development baseline of the Manual.

Since you'll probably need to tweak your CI script for this, I'd let you do it. Once done, I only need to rename my local branch accordingly.

You must be logged in to vote
3 replies
@thoni56
Comment options

thoni56 Apr 22, 2021
Maintainer Author

Thanks. Done.

@tajmone
Comment options

Stopped Tracking PDF and HTML Files

Ok, I've now stopped tracking all pre-built PDF and HTML documents from the repository, and added Git exclusion rules (including for *.xml files, which are assumed to be DocBook intermediate files for PDF conversion), except for the folder with the original documents, and the docinfo.htm files needed by Asciidoctor.

I've also updated accordingly all the README files, removing links to the generated HTML and PDF docs, all HTML Live previews, and added links to the new website, and other content updates regarding the Manual branches for Beta vs Alpha version.

Alpha Man Branch Rebased

I also rebased the alan-manual-alpha branch on master, which contained one last (hopefully the last!) HTML build conflict — but now that we don't track the HTML and PDF builds we shouldn't be seeing these conflicts any longer. I hope this is fine with you, after all the goal of these changes was to be able to always keep the alan-manual-alpha branch rebased on master.

Is it correct to assume that whenever one of us updates master he should also rebase the alan-manual-alpha branch accordingly? i.e. to prevent procrastinating the rebase, which might lead to conflicts if we start editing the dev branch if it's out of synch with master.

At least, this should hold true for all changes that don't involve the Manual sources — the only possible conflicts would be if a same paragraph was independently modified in both master and alan-manual-alpha, which would require a manual resolution via a three-way merge (could actually happen).

TODO: Slimming Down the READMEs

As I side not, while reviewing the various README files I noticed that during the course of time they've gathered an info-overload; so I'm planning to also slim them down, by moving whatever makes sense to the Wiki instead, and provide links instead. There's also some redundant info which should be deleted, and other stuff that is in the main README but should be moved to an independent repo document or Wiki page.

I've also noticed that there are all those batch build scripts which should be replaced with Bash scripts, to avoid having multiple scripts.

@thoni56
Comment options

thoni56 Apr 23, 2021
Maintainer Author

Stopped Tracking PDF and HTML Files

Ok, I've now stopped tracking all pre-built PDF and HTML documents from the repository, and added Git exclusion rules (including for *.xml files, which are assumed to be DocBook intermediate files for PDF conversion), except for the folder with the original documents, and the docinfo.htm files needed by Asciidoctor.

Good. No more conflicts in generated files.

I've also updated accordingly all the README files, removing links to the generated HTML and PDF docs, all HTML Live previews, and added links to the new website, and other content updates regarding the Manual branches for Beta vs Alpha version.

Looks good. And that kind of makes the https://git.io/alan-docs official. I should update the link on the website, it probably points into the void now.

Alpha Man Branch Rebased

I also rebased the alan-manual-alpha branch on master, which contained one last (hopefully the last!) HTML build conflict — but now that we don't track the HTML and PDF builds we shouldn't be seeing these conflicts any longer. I hope this is fine with you, after all the goal of these changes was to be able to always keep the alan-manual-alpha branch rebased on master.

Is it correct to assume that whenever one of us updates master he should also rebase the alan-manual-alpha branch accordingly? i.e. to prevent procrastinating the rebase, which might lead to conflicts if we start editing the dev branch if it's out of synch with master.

At least, this should hold true for all changes that don't involve the Manual sources — the only possible conflicts would be if a same paragraph was independently modified in both master and alan-manual-alpha, which would require a manual resolution via a three-way merge (could actually happen).

Yes, I think that would be a good policy to minimise the risk of the alan-manual-alpha to accumulate to much "debt". And if the rebase caused non-trivial conflicts we can always call on each other for attention and assistance.

TODO: Slimming Down the READMEs

As I side not, while reviewing the various README files I noticed that during the course of time they've gathered an info-overload; so I'm planning to also slim them down, by moving whatever makes sense to the Wiki instead, and provide links instead. There's also some redundant info which should be deleted, and other stuff that is in the main README but should be moved to an independent repo document or Wiki page.

I've also noticed that there are all those batch build scripts which should be replaced with Bash scripts, to avoid having multiple scripts.

Yes, I think bash scripting is useful in more environments than .bat scripts. ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #87 on April 07, 2021 13:07.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.