36 Followers
37 Following
878 Posts

Retired software architect.
Linux and C++

Enjoy impossible software projects.

The Next Phase of SASSY (3)

The third class of data entry is intended for use during the initial discovery phase of a project when not much is known and a lot of note taking is required.

The idea is to have a mind mapping style of program that is in the form of a graph. It will likely get quite large so it will need to support search and a variety of graph visualisation techniques.

I imagine it will also have hooks to schema based entry forms for various types of data. These would include a glossary of domain specific terminology, a bibliography, and a directory of people and organisations associated with the project.

#SASSY

The Next Phase of SASSY (2)

The core task of the data entry system is to create a data entry form from an RDF schema.

I have created a simple instance of this, but it is quite limited in both its abilities and its style. Something more flexible would be nice.

I am thinking that an Entry Form Language would be a useful middle ground between the schemas and whatever it is that implements the data entry system. This would be an RDF schema that defined the elements of entry forms. Specific forms would be created from a combination of schema data and style information.

As a proof of concept I will have a look at Qt's QML. It is a language for defining user interfaces, and appears to have both static and dynamic capabilities. It can therefore support both the classical data entry task as well as something that evolves with the user's new schemas.

2/n

#SASSY

The Next Phase of SASSY

This morning I was contemplating what to do next on the SASSY project. The infrastructure is now mostly in place, so its time to move up a level.

The make or break part of the project is getting the data out of the knowledge base. Its all a bit pointless without that capability. In order to get stuff out, it must first go in. Hence the next level will be support for data entry.

There appears to be three classes of data entry: The first is the classical style for any database application, where the schema is used to design the program. This is for those schemas that are provided as a part of the distributed system.

The second class is for project specific schemas. In this scenario the developers are creating and updating the schemas as they discover aspects of the project's domain. The data entry programs need to dynamically follow the evolving schemas.

The third class is for a pure discovery task. This will be something like a mind mapping program where a graph of ideas and concepts is constructed during the initial investigations for the proposed system.

1/n

#SASSY

I was watching Ms Hossenfelder's latest on the Fermi paradox.

I think that a couple of factors are missing from the Drake equation. It seems to me that there are some significant hurdles that an advanced civilisation needs to overcome for them to be visible to us.

We have been assuming that we would see their electromagnetic emissions. This would be true if they were space faring, as its the only way to communicate between worlds (or even moons). Without space travel most communications would be via point to point connections, such as fibre optics, as there is not enough bandwidth in EM for the ubiquitous communications that we currently enjoy.

Two scenarios that would prevent space travel by an advanced civilisation are "water worlds" and "super Earths".

Developing technology might be possible on a water world via the use of enzymes rather than fire, but space travel would seem to be a big ask.

For planets that are just a bit bigger than Earth (higher g) the rocket equation would make getting to orbit infeasible.

It seems to me that Earth might be rather special by having enough mass to hold its atmosphere, but still allow rockets to get into orbit; while also having both large oceans and dry land.

#FermiParadox #DrakeEquation

Anonymous Nodes for SASSY

The anonymous nodes have now been implemented and tested.

The core part was doing a Base64 encoding of the 16 byte UUID. There are many implementations of this floating about, so, obviously I had to invent my own. The thought was that there isn't any functions in standard libraries that can copy a single bit from one object to another. Time there was.

Two alternative designs were examined. One uses a mask to select the bit and another to set it in the destination. The other shifts the bits from and to the desired position. The latter requires no branches, so I thought it might be quicker, but it was actually about 10% slower.

The end result was a little slower than other solutions for Base64 encoding that process multiple bits at once. However, I do think this version is a lot clearer about what is being done.

Now I just have to examine each of the 100 or so places that use blank nodes and see what changes to make.

#SASSY #RDF #programming

Upgraded to "Not Working".

This laptop is actually a nest of virtual machines. The host does very little beyond running libvirt and its associates.

All user data is on a separate drive that is provided to the VMs via NFS from a server VM. I have multiple users defined that I use for different purposes, testing, development, and general stuff. Each of these users can log into any of the VMs (except for one or two special ones) and see all of their data.

The gotcha with this configuration is the user's directories where programs save configuration data. There is no guarantee that the VMs will have the same version of the programs installed, so data in .local and .config could become confused, or worse.

My solution (from 2020) was to give each user a directory with subdirectories for each VM. A pair of soft-links would then map .local to the VM specific version, via a link in /var/run/$USER/. This link was set up by a script run from /etc/profile when the user logs in.

This worked fine until recently. There is now several things that try to access .local before /etc/profile is called. They tend to not work very well. Some quietly fail with just a bit of inconvenience. Fedora 42 fails completely, as I found out with a recent upgrade.

Each VM is created with a conventional admin user, so I can easily apply a correction. It appears I will need to put the machine soft link in /etc so that it is always there. Some scripting in my future...

#Linux #virtualmachines

Anonymous Nodes for SASSY

The anonymous nodes will be resource nodes that use a URI with a fragment that is a generated id.

The generated ids in the RDF Redland C library use time and PID as their basis, with a local sequence number for each id generated in the session. This might have collisions in a distributed system - unlikely, but not quite unlikely enough. Something based on a UUID would be better.

The UUID library generates a 128 bit random number in 16 bytes. This string is not URI friendly. A URI fragment can have alpha and numeric characters and a very small set of other characters. Only two are necessary if we use base64 encoding - underscore and hyphen would seem to be the best choices.

To avoid soaking up all the randomness only one UUID needs to be created for each session. Subsequent requests can have a sequence number added. A decimal point, separating the UUID part from the sequence number, can make the values a bit more readable.

We now have a representation for the anonymous nodes. Next is to write the generator....

#SASSY #RDF #programming

SASSY has hit a wall, part 3

So, how to proceed?

RDF is a foundational part of this version of SASSY. It is possible that other graph databases might get used in the final version, but for now its RDF. It needs to be used correctly or it will be an unending source of problems.

This implies that probably all the blank nodes need to be replaced by resource nodes (URIs). These I will call anonymous nodes.

The plan:
+ Decide on the representation for these anonymous nodes. Sequence numbers or UUIDs seem to be the obvious alternatives.

+ Update my RDF library to create these nodes.

+ Write a program to turn the existing blank nodes into anonymous nodes.

+ Update all the SASSY projects to use the new code.

3/3

#SASSY #RDF

SASSY has hit a wall, part 2.

The rule engine uses SPARQL to find the data the rules are to be applied to. Blank nodes break this design.

When I first started SASSY I avoided blank nodes. As time went on I realised that generated names for nodes was going to save a lot of time, so I used them in lists and other places where names did not add any real information to the model.

My initial design for generated names was OK for toy examples, but was not going to work well for a multi-user distributed system. Hence I moved to using blank nodes that have identifiers which are mini-UUIDs.

So the early version of the rule engine worked OK since it used generated names. It is only the more recent data that uses the blank nodes, and the world fell apart.

2/n

#SASSY #RDF #SPARQL

SASSY has hit a wall.

I was debugging the rule engine, finding easy bugs at first, and getting steadily harder to resolve as work proceeded. This is normal.

In the end the results were really strange with some odd messages from the underlying RDF library.

It seems I have misunderstood a fundamental component of RDF, the blank node.

The blank node is often described as being useful in things like linked lists or other structural uses where naming the node with a URI would seem redundant. This leads to problems.

The correct use is as a placeholder for when the identity of the node is not known. For example, the perpetrator of a crime before a suspect has been identified. There must be somebody, but they are unknown.

Mostly both uses work. The gotcha is SPARQL. You cannot give it a blank node to search on. If you do it treats it as a variable to be bound to some value.

1/n

#SASSY #RDF