Mastodawn

Peter Murray Jun 21, 2024

What if we built our library systems without patron data?

This idea has been bouncing around my head for many months, and I finally had to get it out into the world. This first part talks about the theoretical aspects. I think there are at least two other parts: one that looks at changes to the #FOLIO_LSP API to accomplish this and a second that talks more in depth about operational details. Comments welcome!

https://dltj.org/article/ils-without-patron-data/

The ILS without patron data: a thought experiment

Library systems hold significant information about patrons, including their search and reading histories. For librarians, ensuring the privacy and confidentiality of this data is an essential component of professional ethics. In the United States, for example, the third point in the American Library Association Code of Ethics is “We protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.” To understand this better, consider how the Video Privacy Protection Act of 1988 arose in the U.S. after the controversy surrounding the publication of Robert Bork’s video rental history. A year earlier, Robert Bork was nominated to the U.S. Supreme Court. In the course of his confirmation hearing, a reporter published Bork’s video rental history. Although this list of videos were not a factor in his rejected nomination, that the list was published was found to be outrageous enough spur Congress to pass the law. Similarly, if your library records were made public, it could well be embarrassing and intrusive. (Side note: While there is no federal protection for personal library records like those for video rental records, state laws offer a patchwork of protections.) Library systems, like the video rental systems of old, tie personally identifiable details with patron activity. So, what if we could separate these details? Before we delve into this, let’s define some terms related to Federated Identity systems. Skip these sections if you know about Federated Identity systems. Federated Identity Systems: Identity Providers and Service Providers In our complex world, library services often come from multiple providers. Rather than have the hassle of separate logins and passwords, it is common for these providers to call back to a central service where a people can prove they are who they say they are. The place where people log in called an Identity Provider (IdP). The place where people want to go is called a Service Provider (SP). A Federated Identity System is a trust relationship and a set of agreements/technologies that enable the sharing of identity information and authorizations across systems. It allows people to access resources and services across different systems using a single set of credentials, typically managed by their Identity Provider. (IdPs are sometimes called Assertion Parties because these are the software systems in the trust relationship that assertions about who a user is; SPs are sometimes called Relying Parties because they are trusting the IdP’s assertions.) Federated Identity systems exchange attributes about someone. Those attributes can be specific to a person, like “name” and “email address”, or general categories, like “student” or “community-member”. Attributes can also have special meanings to the IdP and SP, like Pairwise-Subject-ID. Pairwise Subject Identifier An identifier that is specific to a user is called a “subject identifier”. These typically look somewhat like an email address with parts specific to both the user and the organization. For example: [email protected] — murraype is specific to me and dltj.org gives the identifier context to my organization. In a Federated Identity system, the same subject identifier is given to every SP that asks for it. However, if we don’t want multiple SPs correlating a user’s activities, we can use a “pairwise-subject-identifier”. Within this workflow, the IdP sends different identifiers to different SPs for the same person, making the identifiers unique to each IdP-SP pair. More formally, pairwise-subject-identifier (“pairwise-id”) is defined this way: This is a long-lived, non-reassignable, uni-directional identifier suitable for use as a unique external key specific to a particular relying party. Its value for a given subject depends upon the relying party to whom it is given, thus preventing unrelated systems from using it as a basis for correlation. Typically opaque, these identifiers don’t offer additional information to the SPs trying to correlate activities between users. For instance, the pairwise-id between IdP-SP#1 is [email protected] and the pairwise-id between IdP-SP#2 is [email protected]. Not only can the two SPs figure out if this is the same person, there is also no meaning in the identifier to find out who this person is in the first place. Pairwise-ID as THE library system ID In our ideal library system aiming to minimize personal data collection, the pairwise-id becomes the unique identifier in the library system. (There are some drawbacks to using the pairwise-id as the unique identifier…we’ll get to those later.) The first time the library system’s SP gets a new pairwise-id, it creates a new user record in the system. The system uses other attributes from the IdP to determine privileges for this new record - for instance, a “student” status gets a normal loan period, a “faculty” status gets an extended loan period, and a “conference visitor” status gets blocked from borrowing. The library SP is trusting the attributes received by the IdP—see the discussion above about the trust relationship for the assertions—so it does not need prior knowledge about the patron. So other than knowing that the person is a specific individual with a recognized status in the organization, the library system knows nothing about the patron. If the patron’s borrowing and search history are leaked from the library system, the system’s leaked records has nothing else to offer to tie those to a person. (Again, there are de-anonymizing nuances, but for later discussion.) …but I need to send overdue notices to the patron Let’s consider some operational aspects that usually require personal data: sending overdue notices, applying fees to a patron, and handling patron requests. The library system knows enough about its patron community to check out books to authorized users—people with attributes coming from the IdP that we trust and use to set how long the loan needs to be. What if a user keeps a book too long…we need a way to send a notice to a person to return the book and to bill them when they don’t return it. But the only thing the library system has is an opaque identifier that only has meaning at the IdP. Library systems are typically self-contained: they send their own email messages and have their own billing systems for keeping track of patron charges. In a library system without patron data, though, we need to rely on others with more information about the person to handle those tasks. Let’s take the example of sending notices to the patron. Rather than the library system doing sending the notice itself, our system tells another system to do it. The group that runs the IdP has a service that, when given a pairwise-id and the content of a message, will send that message to the patron for us. Another example: billing the patron when they say they’ve lost the item or the library declares it missing. The IdP group has another service that takes in the pairwise-id, a currency amount, and a description then adds that information to the person’s central account. The library keeps track of the fact that a pairwise-id has been billed, but it never knows the person behind that identifier. If the item turns up again, our library system reverses the charge: it sends the pairwise-id, a credit amount, and a credit description. Library patrons also request items be held for them; what do we do in this case? When someone requests an item, the library system prints a “paging slip” that is used to get the item from the shelf. The paging slip has information about the item—its title, author, and shelving location—as well as information about the person who requested it. The paging slip usually turns into the hold pick-up slip; it is taped to the outside of the book and shelved alphabetically by the patron’s last name. There is a serious privacy downside to this workflow, though: everyone from the staff member pulling the item to the other users browsing the hold-pickup shelf can see the name of the person who asked for it. Instead, our library-system-with-no-names prints a random three-word phrase to stand in for the name of the person who asked for the item. This same three-word phrase is sent in the hold-pickup message to the library patron so they can find the item on the hold-pickup shelf. But could we build it? While this thought experiment is theoretical, could a real-world library system actually function this way? In the next post, we’ll explore possible adaptations for the FOLIO Library Services Platform to turn theory into practice.

Disruptive Library Technology Jester

Show thread

Peter Murray Jun 22, 2024

Can we build a minimal-personal-knowledge library system?

Starting with the #FOLIO_LSP as a base, I think we can. There are a small number of APIs that will need different implementations and a few APIs added, but the rest of the platform remains the same! I outline the details here: https://dltj.org/article/ils-without-patron-data-folio/

The ILS without patron data: a thought experiment realized with FOLIO

In the previous blog post, I outlined the concept of a library system with no personally identifiable information as a way to safeguard a patron’s right to privacy. Library systems commonly retain traces of a patron’s library activity, and the librarian ethos protects a patron’s privacy as they conduct their research and borrow items from the library. Suppose our library systems decoupled patrons’ personal information from their library activity. In that case, the risk of leaked information from the library systems is significantly reduced. In this blog post, I examine how a modern library service platform could be modified to handle this minimal personal knowledge system. As you may recall, this proposed system uses pairwise-subject-identifiers (“pairwise-id”) from an organization’s identity provider (“IdP”) to identify people. Our service provider (“SP”) uses that identifier internally and calls external services that can find out who the pairwise-id is when necessary. I’m using the library services platform with which I’m most familiar: FOLIO. As an open-source library services platform, FOLIO offers a relatively straightforward path for such customizations. In the following sections, I’ll examine what our library system SP needs to do when encountering a new pairwise-id for the first time, how to send patron notices and bill patrons, and changes to the hold-request subsystem. I’ll also discuss some changes that are needed to FOLIO itself. For the sake of brevity, I’m calling this FOLIO version “FILP” — the FOLIO Identity Limited Platform. A New Pairwise-ID is seen at FOLIO login The FOLIO Settings → Tenant → SSO settings pane FOLIO includes a SAML SP endpoint that assumes user records have already been loaded into the system. Configuring this endpoint requires naming the SAML attribute that will contain the person’s unique identifier and which field of the FOLIO user record has that identifier. In this example, the FOLIO SP is looking for the user identifier in the uid SAML attribute from the IdP and will search the contents of the External System ID field in the user record. In our FILP version, we could use the SAML module unmodified; we would need to pre-load user records with the pairwise-id in the External System ID field. FOLIO user records have four required fields: patron group, active/inactive status, email address, and last name. The pre-loaded data would include the patron group appropriate for the pairwise-id patron and an “active” status. The pairwise-id is also copied into the email address field; I’ll describe later in this post how the pairwise-id is used in the FILP version of the email module. In the last name field, we can put the three-random-word phrase that will be used for hold-pickup notices. (More on this in the holds section below.) Our FILP SAML login module can also create user records on-the-fly when a new pairwise-id is seen. The IdP sends attributes (such as “student” or “faculty”) to the FILP SP that are needed to determine the appropriate patron group; the settings for the SAML module would contain a table that maps those attributes to patron groups. The pairwise-id is copied to the email address field, and a random last name will also be recorded in the new user record. New Email Delivery Module FOLIO has a built-in email module with a simple API for outbound email. Other FOLIO modules send a POST to the /email endpoint with a JSON body that contains the email details, including the to address and the body of the message. The built-in email module has configuration settings for the SMTP server, and it takes responsibility for sending the message. Our FILP version of the email module has the same API signature as the built-in module: it listens for POST requests to the /email endpoint and accepts an identical JSON body. It is a drop-end replacement; the other modules in the FOLIO system don’t know that they are communicating with a FILP-enabled email module. Remember from the previous post that the IT group running the IdP will need new services that act on behalf of our library system in cases where a patron’s identity must be known. One such service sends an email to a pairwise-id (say, the “IdP Pairwise Email Service”). This service takes the pairwise-id and looks up the actual email address. Also remember that we copied the pairwise-id to the email address in the user record. Our FILP email module reads the JSON body to get the pairwise-id in the ‘to’ field, then sends it and the message contents to the IdP Pairwise Email Service. The IdP Pairwise Email Service returns a success or failure message, which our FILP email module records in its database. New Fee-Fine Module Like the FOLIO email notification module, there is a single point that FILP will need to override to send fee/fine information to an external agent. Also, similar to the email module, the IT group running the IdP will need an IdP Pairwise Billing Service. When that service is given a pairwise-id, a charge/credit amount, and a message, it will post a transaction against the patron’s organization account. FOLIO’s existing fee-fines module has a POST method to create a new fee and a PUT method to modify an existing fee. The FILP version of the fee-fine module is a drop-in replacement for those /feefines and /feefines/{feefineId} API endpoints, and it accepts the same JSON bodies as those endpoints. The ownerId field in the JSON body is the FOLIO user record identifier, and our FILP feefine module uses that identifier to look up the pairwise-id in the user record to forward the data to the IdP Pairwise Billing Service. No changes to the Requests module The third example from the previous blog post of the impact of our FILP minimal-personal-knowledge library system was item request pickup slips. For context, the typical hold-paging-request workflow is for the library to print a paging slip that contains the title, author, and shelving location of the requested item along with the patron’s name and contact information. The pickup slip is attached to the book and placed on a hold shelf for the patron to pick up. In this typical workflow, the patron’s name is intimately tied to the requested material. Instead of printing the patron’s name, we use a random three-word phrase stored in the FOLIO user record’s last name field when the record was created. That random phrase is printed on the pull slip. When FOLIO sends a hold pickup notice to the patron, the {{user.lastName}} replacement token is available to insert in the body of the message: The item you requested, {{item.title}}, is now ready for pickup at the main library hold shelves. Items on the pickup shelves are sorted alphabetically using a three-word phrase. Your three-word phrase is {{user.lastName}}. Changes Required within FOLIO An important point in this description of how the pairwise-id is used in FOLIO is that the patron is the one logging into FOLIO to perform these actions. Currently, FOLIO performs circulation operations like a typical integrated library system: to check out an item to a patron, a staff member logs into FOLIO with privileges to perform the checkout function. That checkout function allows staff members (with the required permissions) to check out any item to any user record. In our FILP FOLIO, though, the staff member won’t be able to scan a patron’s barcode to identify the patron…the patron will need to log in through the IdP single sign-on system so the pairwise-id is transmitted to FOLIO. Since it is the patron that is logged into FOLIO at this point, we will need a new API endpoint for a function that checks out an item only to the logged-in user record (rather than any user record). FOLIO differs from previous library systems in that patrons are “first class” users. The only thing that differentiates a library staff member’s account is the permissions on their user record. As described above, an access service staff member will have permission to use the Checkout app to register a loaned item on any user record. A patron user will need a permissions set that allows access only to their user record. Several other endpoints will need similar modifications: an endpoint that records a hold request for the logged-in user, an endpoint that allows someone to set notification and pickup preferences for themself, an endpoint that requests a renewal for a checked-out item, and so forth. Conclusion and the Way Forward FILP, as described above, still has some potential ways to correlate library activity to a specific patron and possibly de-anonymize that person. This blog post is already nearly 2,000 words, so I’ll save that discussion for another post. FOLIO’s architecture is excellent because it is almost possible to build the FOLIO Identity Limited Platform—FILP—today. Replace a few back-end modules and add API endpoints where capabilities are scoped to an individual user record, and we’re pretty much there. This article’s subtitle is “a thought experiment realized in FOLIO”. It is almost enough for a statement of work. I’ll add a plug for the company that I work for here in the last paragraph. If your library would like to do this with FOLIO, Index Data specializes in this type of software development. Few things would please me more than having the chance to build this into FOLIO. Contact me if you want to discuss this further or enter into a development agreement to add this capability to the FOLIO open source codebase.

Disruptive Library Technology Jester

Show thread

Peter Murray Jun 24, 2024

Tonight I published the third piece about a minimal-personal-knowedge library system with some open questions: securing the circulation app to in-building use, my proposed system might still be subject to deanonymization, and discovery layer integration challenges.

https://dltj.org/article/ils-without-patron-data-details/

Thanks for all of the discussion so far.

The ILS without patron data: open questions

In my prior two posts, I outlined a strategy to minimize personally identifiable information in library automation systems (idea overview, impact on FOLIO). This approach uses a unique single-service identifier (the “pairwise-id”) recognized exclusively by the identity provider (IdP) and the library’s service provider (SP), effectively preventing any cross-system correlation of an individual’s activities. The only personal information the library system stores is the pairwise-id, meaning that there are no exposed names, addresses, phone numbers, or other demographic details in the event of a system breach. When the library system needs to notify the user or post a charge to the user’s account, it invokes the “IdP Pairwise Email Service” and the “IdP Pairwise Billing Service.” You might ask why we’re going to these lengths. Why put in the work to create these extra email and billing services? The goal is simple: to make potential attacks less fruitful. By limiting the storage of personal information and narrowing the APIs that access it, we create fewer avenues for potential exposure. This approach frees resources to focus on the remaining sections, like the IdP, Pairwise Email Service, and Pairwise Billing Service, which access personal information. This approach also strengthens the privacy of the remaining library workflows. For example, access services staff members only see the pairwise-ids, not the patrons’ actual names or other personal details, as they check-in items or process hold requests. Of course, there may be circumstances when library staff need access to a patron’s personal information. To accommodate such cases, we could add a new FOLIO app that retrieves these details for authorized personnel. Any such access would be recorded and subject to auditing to prevent misuse. In this post, I’m finishing up this series (for the time being?) with a collection of additional details and open questions - starting with a correction. Correction: Library Staff Check-out app I added a correction to the previous post about implementing a patron-data-minimizing library service in FOLIO. In the section about checking out a book, I mentioned that the only way for a user to check out an item was for the user to log into the library system via the IdP. Also, I said that existing functions could not be utilized, such as when a library staff member with appropriate permissions uses the Check-out app to register an item loaned to a patron by scanning the patron’s ID card barcode. My colleague Mike Taylor pointed out that this was incorrect. In my own mind, I had taken the minimal patron record one step too far. We can indeed use the barcode field in the user record; this barcode could either be from a pre-loaded patron record or supplied by the IdP as an attribute when the patron logs in for the first time. Once the barcode is in place, the existing Check-out app can function as it currently does. Nevertheless, libraries must be mindful of potential risks as barcodes are visually accessible and not as easily changed as passwords. Securing Circulation Station Related the Check-out app, we need a strategy to control where check-outs can occur. If a patron is logging into FOLIO to use the Check-out app, we’ll ideally want this process confined to the library building. A potential solution might involve using client HTTPS certificates; with this method, FOLIO would only provide access to the Check-out app if the user’s browser presents a client certificate installed exclusively on the circulation stations. Keycloak could be beneficial in this regard. In EBSCO’s presentations about Keycloak replacing the original authentication mechanism, location-based login was highlighted as an advantage. Deanonymization While these modifications have minimized personal data in the library system, we haven’t completely eliminated it. A patron’s activity itself — the stream of topics browsed, articles downloaded, and items borrowed — can act as a fingerprint of their interests. The elements of this fingerprint can be quite distinctive when considering their content, time-of-day, and location. With sufficient data, an intruder could potentially link the activity back to the individual behind the pairwise-id. There are strategies to mitigate the risk of accumulating patron activity. For example, the IdP could generate a fresh pairwise-id for each login by the patron. In this scenario, the IdP would need to maintain a record of all pairwise-ids, and would likely want to implement automatic user provisioning (where the library system SP automatically generates a new user record for every new pairwise-id). This approach presents new challenges, such patron blocks that rely on the maximum number of checked-out items or the maximum amount of fees levied on a patron. Since the patron’s activities are now scattered across multiple user records in FOLIO, we need to introduce a “Pairwise Block Check Service.” This service would take a pairwise-id, track down all other pairwise-ids tied to the same patron, and tally their total loans and library fees. It would return a yes or no answer on whether the circulation transaction can proceed. Deanonymization is a topic where a lot of research is ongoing. We would want to engage these researchers to make sure our approach of limiting the correlation of patron activity is sound. Discovery integration FOLIO doesn’t come with a built-in discovery layer. This was an intentional design decision, aimed at defining clear boundaries that allow for the integration of a library’s preferred discovery layer using well-defined,and versioned APIs. As it stands, all known discovery layer integrations connect to FOLIO using a central account with permissions to access all users’ circulation records. These records are fetched using a patron identifier, such as the pairwise-id. However, this method makes the discovery layer’s FOLIO user account as a potential security vulnerability. Ideally, we would want each patron to log into FOLIO using their own account. Doing so would naturally restrict each user’s visibility to their personal record. At the moment, I’m uncertain whether such an indirect (transitive) login setup is feasible. In other words, can a patron log into their chosen discovery layer via the IdP, and could the discovery layer then use this authentication to log into FOLIO? All Done? So, I think that is it…I’ve gotten all the parts of this idea rolling around in my head out into the world. Thanks for the discussions on Mastodon and elsewhere about the specifics, and I’m looking forward to hearing more thoughts and, if necessary, integrating them into a fourth blog post. I feel compelled to express gratitude for having a system like FOLIO to explore this idea in a tangible way. FOLIO’s primary emphasis on an API-first approach makes this concept feel more feasible. When I say API-first, I mean there are no hidden APIs within FOLIO: for every task that can be performed in the user interface, a well-defined, versioned API exists to facilitate the same function. Beyond the user interface, the modules within FOLIO are compartmentalized by function and communicate with each other using the same well-defined, versioned APIs. As a result, replacing a module to adapt FOLIO for unique uses is entirely viable.

Disruptive Library Technology Jester

Show thread

ranti

@dltj I got a 403 error message. FYI.

Show thread

kcoyle checking the perimeter Jun 24, 2024

@ranti @dltj ditto. the link looks like http://org.dltj.blog.s3-website-us-east-1.amazonaws.com/article/ils-without-patron-data-details

Show thread

Peter Murray Jun 24, 2024

@kcoyle @ranti ARGH. Typo in link and the redirect didn’t work. Link in the mastodon post now fixed.

Show thread

ranti Jun 24, 2024

@dltj @kcoyle Thanks!