Podcast #1: Privacy Compliant Web Fonts

Show notes and transcription for the podcast episode: Privacy Compliant Web Fonts
docs
quarto
web
css
privacy
podcast
Author
Affiliation

miah0x41

Published

May 2, 2024

Modified

June 2, 2024

Podcast on Privacy Compliant Web Fonts featuring YouTube and Spotify logos.

This post contains the show notes for the following Curious Data Explorer podcast:

Spotify Podcast YouTube Podcast Amazon Music Podcast

Increasingly Data Explorers and Practitioners are using websites and web applications to communicate their insights and findings. Popular frameworks are effective in creating stakeholder friendly visuals but often have their own distinct style. In many corporate environments in particular, this can prove distracting leading to frustration amongst Data Explorers. Whilst changing colours and position of elements are straightforward, the use of fonts is not.

This episode explores the importance of fonts in websites and web applications, the challenges of using fonts installed on user machines, the use of external font providers and the associated privacy concerns. The episode then discusses how to address these concerns by using alternative external providers or self-hosting fonts. There is no evidence to state that any of the Prvacy focused external providers can’t be trusted but the option of self-hosting is a way to mitigate some of these concerns.

It should be noted that the episode is based on the GPDR Web Fonts blog post.

Show Notes

The following terms, services, concepts and libraries were mentioned in the episode:

HTML
Hyper Text Markup Language - the semantic structure of a website.
CSS
Cascading Style Sheets - the styling of a website.
Markdown
A lightweight markup language with plain text formatting syntax.
Hugo
An open-source static site generator
Jupyter
An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Responsible for the design of the Jupyter Notebook format.
RStudio
An integrated development environment (IDE) for the R Language.
Word Documents
Microsoft Word documents.
PowerPoint Presentations
Microsoft PowerPoint presentations.
Google Documents
Google Docs documents.
CSS Frameworks
A pre-prepared library that is meant to be used as a foundation for a website.
Bootstrap
A popular CSS Framework.
MkDocs
A fast, simple and easy to use static site generator that’s geared towards building project documentation.
Python
An interpreted, high-level and general-purpose programming language.
Material of MkDocs
A theme for MkDocs based on Google’s Material Design guidelines.
Material Design
A design system developed by Google and used for Android and Web applications.
DocuSaurus
An open-source project for building, deploying, and maintaining open-source project websites easily.
Streamlit
An open-source app framework for Machine Learning and Data Science projects.
Plotly
A technical computing company known for creating Plotly and Dash.
Dash
A productive Python framework for building web applications.
Shiny for Python
Python framework for web applications with reactive programming.
Shiny for R
R framework for web applications with reactive programming.
Jupyter Book
An open-source project for building beautiful, publication-quality books and documents from computational material.
Quarto
A open-source tool for authoring and publishing reproducible, production quality documents using Python, R, Julia, and Observable.
TaiPy
A Python framework for building web applications.
Reflex
Web applications in pure Python.
GPDR
General Data Protection Regulation - a regulation in EU law on data protection and privacy.
CCPA
California Consumer Privacy Act - a state statute intended to enhance privacy rights and consumer protection for residents of California, USA.
Google Fonts
A library of free licensed web fonts.
Firefox
A web browser developed by the Mozilla Foundation.
Developer Tools
A set of web browser tools that allow web developers to inspect and edit the HTML and CSS of a website.
Bunny Fonts
An open-source privacy-first web font platform.
SEO
Search Engine Optimization - the process of improving the quality and quantity of website traffic to a website or a web page from search engines.
CloudFlare Fonts
A global Content Delivery Network (CDN) that focuses on improving performance and eliminating privacy concerns from Googl Fonts usage.
Google Web Font Helper
A service that allows you to self-host fonts from Google Fonts.

Transcript

Welcome

Welcome to the Curious Data Explorer channel.

We cover topics for anyone interested in using data to improve their understanding and decision-making.

Our channel operates across three mediums – our Blog, our YouTube channel and our audio experience, which you are listening to right now.

Although we cover similar topics, the content we provide is bespoke to each medium.

This audio recording hasn’t been, for example, taken off a video and so we hope to provide a tailored listening experience – whether you’re listening to us on your commute, at the gym or on a rainy British day.

My name is Ashraf and I’m an experienced Data Practitioner. Today’s topic is Privacy Compliant Web Fonts, which I admit at first doesn’t sound like a topic for Data Explorers, but stay with me and explain why it’s unusually important.

But before we begin, like all our episodes, they are about technical subjects and services.

And I want you to focus on what I’m saying rather than trying to guess the name of a new tool or a library.

Therefore, at the end of the episode, I will recap all the key terms, the concepts and the services.

They will also be available with our show notes on our Blog.

Introduction

Welcome to the show.

Today’s topic is Privacy Compliant Web Fonts.

And I intend to cover this in four parts.

Why should I, as a Data Explorer, care about fonts and in particular Web Fonts?

I’ll then introduce how fonts are customized for websites and web applications, how Web Fonts work and their potential privacy concerns, before finally implementing the method or multiple methods for Privacy Compliant Web Fonts.

It may feel that the role of a Data Explorer is to crunch numbers, to analyze data and provide an answer to an exam question.

But in practice, as most of our more experienced listeners will confirm, the role is ultimately of Communication, our ability to take complex inputs, distill them down into their essential parts and then communicate in a way that the wider business and our stakeholders can understand.

Sort of a translation or translator type role.

Increasingly, the medium of choice for communication is moving from static *PowerPoint presentations to more dynamic websites and web applications.

This is then, therefore, of interest to us to be able to understand how to customize that and how we communicate.

So you might wonder, well, where does fonts play a role in all of this?

Well, if you’ve had that experience, as I have, where you’ve done maybe 40, 50 hours worth of analysis, you’ve finally got the answer you want, you then present it to your stakeholders.

And rather than focus on the message, they seem to be somewhat distracted and ask you, well, can you change that font?

Can you change that color?

And unfortunately, despite all that hard work, it’s very easy for people to become distracted if the communication medium doesn’t meet their expectations.

So it is therefore of interest to us to be able to understand how can we change or modify the web application output.

The scope of this discussion isn’t really around the basics of Cascading Style Sheets (CSS) and HTML, we’ll get into all those a little bit later.

It’s more about recognizing there is a need and then the thinking and thought process to try and achieve those goals.

But again, as Data Explorers and Practitioners, we may have encountered that situation or we may not have, we might have good stakeholders who aren’t so fussed about font sizes and the colors and so forth.

Types of Websites

It’s useful for our discussion today to adopt three categories when discussing websites and web applications.

The first category is around Static Websites, the second around Documentation Websites, and the third is about Interactive Dashboards and Computational Websites.

Static Websites, if like me 25 years ago, wow, that’s something a long time, a quarter of a century, you may have constructed using a text editor, using the HTML language and Cascading Style Sheets (CSS) to provide styling and also some dynamic elements.

But manually converting insights, manually converting messages and constructing websites can be quite difficult and time consuming.

As such, frameworks were created to try and support this, frameworks such as Hugo, which allowed you to provide the inputs in perhaps a more user friendly format such as Markdown, which as Data Explorers we’re quite familiar with, if in particular, if you use the Jupyter ecosystem or indeed the RStudio ecosystem.

So these frameworks would take these Markdown inputs, they would run it through a process and produce a static website which could be hosted anywhere.

And the idea behind a static website is there is no server-side requirement other than a web server.

There’s no database, there’s no server-side processing.

The second category that Data Explorers may have come across is around documentation websites, and here we have an overlap of interests with other types of developers, in particular software developers.

And much like them, we are keen practitioners of reproducible data science, and as such it’s useful to be able to write down the process we took, perhaps some of the insights we’ve got, or where can we get access to data, what steps we took, what cleaning methods we used and so forth.

So the idea to document something rather than using more static Word documents or Google documents or *PowerPoint presentations.

And in that category, we may have encountered frameworks such as MKDocs, a very popular Python-based, “MK” is the Markdown component, Markdown-based documentation system.

In particular, the Material Theme is a very popular choice, for example. We may have also encountered, on the other end, the more modern version of that, in some might argue, through DocuSaurus, for example.

So there are a number of frameworks which are intended, again, to make it easy to write things down, for example, using Markdown, plus some framework-specific syntax to get call-out boxes, to get footnotes, to get references, for example.

So in that second category, we may have encountered tools such as MKDocs and DocuSaurus.

In the third category, we have interactive dashboards.

It’s hard not to be involved in the Jupyter ecosystem, in particular, and not to have heard of Streamlit.

But along with Streamlit, from Plotly, there’s been Dash, as another example.

And then we’ve got a bridging mechanism between these interactive dashboards and so-called computational websites.

So Shiny for Python was recently released, for example, on the back of Shiny for vR, which kind of bridges that gap, but favours more the computational side than, say, the dashboard side, in which one can provide R Markdown, Jupyter Notebooks, or even plain Markdown documents that have interactive computations, which are then run and the assets themselves are presented in a website format.

On the extreme end of computational websites, we have Jupyter Book, which as the name suggests, is intended to allow users to author books based on Jupyter Notebooks.

Equally, and the one that we use at the Curious Data Explorer channel, use is Quarto, which allows a user to take Markdown-flavoured inputs or Jupyter Notebooks and produce interactive websites and or Blogging platforms.

The reason I want to highlight these three types of categories is because they’re all used for communication, whether internally or externally.

So static websites might be externally facing, but there are also internal examples.

Documentation websites are sometimes the day-to-day operational element of an organisation where there is a relationship perhaps between a Data Explorer and a DevOps engineer or a data engineer or IT, for example, Information Technology (IT) team.

And then the interactive dashboards were born out of a need to communicate effectively, going back right to the beginning of how does an analyst communicate with business stakeholders?

And Streamlit is a good example, but has limitations.

But in that space, we have increasingly more and more options.

We have TaiPy, or if you like to pronounce NumPy as Num-Pee, then Tai-Pee, I guess.

But TaiPy is one example.

I mentioned Shiny for Python on the computational side.

If you want to create basic, if you want to create web applications or websites with pure Python, then you have Reflex is another good example.

But in all of these, they have the same fundamental issue.

They have their own themes.

They look like external tools.

And if you’re in the unfortunate position where a stakeholder really can’t see past the colouring and the formatting that you’re using to communicate something, then that can be quite difficult.

So one of the fixes here, clearly, is to align the styles and the fonts to meet the brand.

And it shouldn’t be true, but it is often, unfortunately, very true in some corporate environments.

That is the only way that people will focus on the message you’re trying to deliver.

So in some ways, this is the context for this discussion today.

Importance of Fonts

We’re looking at fonts because fonts are a way of communicating a designer’s intent.

So not only is it important or can be unusually important when presenting interactive or computational websites, but also it’s equally applicable if you want to host a website, if you want to host your own documentation site to advertise your portfolio of skills or projects that you have completed.

So the aim here is ultimately we want to provide a consistent experience to our web users.

And when it comes to font selection, you have two choices.

The synopsis for this discussion is, your two choices are that you can either choose a font most likely installed on a user’s machine, or you can specify a font that could be downloaded when the browser or when the webpage itself is loaded.

But that has potential privacy concerns, in particular, the General Data Protection Regulation from within the EU, the GDPR, and more recently, the California Consumer Privacy Act, the CCPA.

Unfortunately, it’s been demonstrated that popular services, external font provider, such as Google Fonts, is non-compliant.

So what are developers to do?

The purpose, hopefully, of this discussion is to go through what those concerns are.

And if you want to skip ahead, you can go to the website and you can see the code you need.

But really the intention behind this discussion is to try and convey the mentality and the process in which we can solve similar problems.

So the focus is on Web Fonts, but it can extend to other external resources.

The methodology is the same.

The regulations are the same, certainly.

And so our approach needs to be flexible for meeting all of these concerns.

With that said, I want to move on to the second part of our discussion, which is around introduction to fonts on websites.

Fonts on Websites

A very long time ago, the HyperText Markup Language, HTML, originally included a font tag, but that was deprecated in terms of using Cascading Style Sheets (CSS).

And there has been a movement within the software industry around the concept of “Separation of Concerns”, the idea that the HTML will provide the semantic layer, like the structure of a document, through tags and Cascading Style Sheets (CSS) would look after, as the name suggests, the style elements of the website or the web application.

So to try and understand a little bit about what is the issue here, I want to focus just a tiny bit on a minimal example.

Now, this is an audio experience, so there is no code for me to show you, but it’s very trivial for you to find out the basic structure of a HTML5 page.

And there are many examples of what the minimum tags should be, and that’s something that I think you can very easily Google.

The bit I want to focus on is how do we change the font?

Well, the font is specified if using Cascading Style Sheets (CSS), using a property called font-family.

And a font-family, as the name suggests, allows a user to provide a series of fonts, which are comma separated.

If the font name includes spaces, then it should be enclosed in quotes.

But for me, best practice is to always use quotes and then you never forget, for example.

So in our simplest case, we might choose a font family consisting of Arial, the font Arial.

So most web browsers now contain tooling to explore and debug websites.

These are commonly referred to as the Developer Tools.

So for example, within Firefox, you can right click on any element on any web page and then click on the Inspect keyword and it’ll open up the Developer Tools.

The Developer Tools consist primarily of three panels.

On one side is actually the HTML markup.

You can see the structure of the website.

The second panel typically is focused on the styling.

So this is in effect the instruction that the website receives, typically through Cascading Style Sheets (CSS).

And the third panel can have additional details depending on the web browser.

So within Firefox, you can have a look at the layout, which talks about the box model in particular.

You can look at the computed styles, which is to say if you have multiple styles and they override each other, what’s the actual final intended output.

But critically, it includes an additional tab called fonts.

And what that does, it allows us, for example, to look at what actual font the web browser is using.

You may have gathered therefore, there’s a difference between what the website has requested, which is in the computed styles versus what the web browser is actually using.

And the reason for that is, is that if a website specifies a font unavailable to the web browser, the web browser will use its internal defaults as substitutions.

What that means in practice is if we choose a font not available on the user’s machine, they will get a very different experience to the one that we have designed or intended to.

So as an example, supposing we go to a font provider, such as Google Fonts, and we think that our website, in terms of its heading, needs a bit more, is a more interesting font, and we select something like “Holtwood 1 SC”, or in fact a myriad of fonts available on Google Fonts.

Our initial instinct is to update our Cascading Style Sheet (CSS) for the H1 tag.

We want to change the font-family, and we put in our fonts of choice.

But if we update that, we will discover that unless we chose something which exists on our machine itself, we get a very different result.

And if we use the Developer Tools, that will simply confirm, using the Fonts tab or something of equivalent, that certainly for Firefox, on Windows, that default font is Times New Roman, which is clearly not what we intended.

So that’s a very simple way of illustrating the importance of choosing the right font.

The second aspect that we need to consider is that most browsers support a fallback family, typically serif or sans serif.

So it’s good practice when specifying a font-family that we put at the beginning in order of precedence the font we want, but the last entry should either be serif or sans serif as a fallback mechanism.

This then leads to the next reasonable question, which is, well, if we’re reliant on fonts being installed on the user’s machine, which font should we go for?

Well, we are not the first people to have asked that question.

Back in the 1990s, Microsoft undertook a study to try and figure out what was installed on most people’s machines.

And historically, I like to refer to it as the classic Web Safe font, depending on the source.

Our fonts such as Arial, Courier New, Georgia, Times New Roman, and so forth.

For many of you, you may recognize those fonts as being predominantly Microsoft Windows driven.

And in some ways, they do not reflect the dominance of mobile devices, which typically are the flavors of iOS from Apple or Android led by Google.

There are some other tools which I can provide links to, which provide greater granularity in figuring out which fonts are installed where.

But even the most popular classic website font, Arial, cannot be guaranteed to be installed on some machines.

For example, some Linux distributions do not have Arial installed as default.

The user must go through a process of installing that.

If that’s the case, can we leverage existing work which might tell us what Web Fonts are potentially useful?

And one mechanism is to recognize that this problem has been solved by other people.

There are such things as CSS Frameworks.

And these frameworks try and create so-called components in the web sense, which is a grouping of styles to best reflect, say, tables or buttons or forms.

And one such framework is called Bootstrap, a very popular CSS Framework.

Within that, they define a set of variables and font-family for sans-serif fonts.

And it contains 10 to 15 different fonts, where Arial actually is six or seven down.

And that’s because operating systems, both mobile and on desktop, have moved forward such that they have their own default font for user interfaces called system-ui.

So there are some existing font stacks, that’s the phrase used for a collection of fonts, which we could try and use.

But even then, the nature of a font-family is if the first one is unavailable, the second font is used, if the second font is unavailable, and so forth.

So we still cannot guarantee consistency between our users.

Consistent Font Experience

So the options available to us are threefold.

The first is we can link to a font asset from an external provider.

The second is we can import a web font from an external provider using Cascading Style Sheets (CSS).

And the third is we can host the font ourselves and make it available to the user.

But even in all those scenarios, we still can’t ultimately guarantee that the end user will be able to use that font.

If you’ve worked in a corporate environment, you may know there are company security or IT policies that can prevent access to fonts in particular from external providers, but even in some instances from internal providers due to the fear of cybersecurity issues.

The provision of a font from an external provider utilises the web font technology.

Since 2010, web browsers, major web browsers, have supported the font-face property of Cascading Style Sheets (CSS) and the associated import rule as a means of importing a font or a stylesheet not available on the user’s machine.

So in practice, this means you can go to an external font provider such as Google Fonts, have a browse, select the font you want, press get and have two options for utilising that font in your website or your web application.

The first is to provide a link to the external asset using the link tag, the link HTML tag which sits within the head section of the document.

The second approach is to use a Cascading Style Sheets (CSS) native rule called import where you provide the URL to the same input.

Both of those methods will provide the required output and hey presto, you now have a consistent experience amongst all your users.

Unfortunately, the most popular provider Google Fonts was found through a German court to be non-compliant with the GPDR.

As such, web developers have a bit of a conundrum.

You can either seek consent before you dynamically load the font from Google Fonts or you must use an alternative mechanism and the purpose of this discussion is to explore what those mechanisms might be.

Privacy Compliant Web Fonts

So this then leads into the topic of this episode: Privacy Compliant Web Fonts.

We’ve discussed the importance of the fonts for consistency.

We’ve recognised the challenges on relying on fonts only available on the user’s machines.

We’ve understood the convenience of using Web Fonts but they ultimately can mean interacting with external provider and therefore you must be considerate of their privacy policy and in terms of Google Fonts they are not compliant with GPDR.

So therefore you have three or two broad choices.

The first is to find an alternative external provider and the second is to self-host.

Both of which sound tricky at first.

If we look at external providers there are two providers that I want to discuss.

The first is Bunny Fonts which advertise itself as fully GPDR compliant.

So a quote from their website is that “Bunny Fonts is an open source privacy first web font platform designed to put privacy back into the internet with a zero tracking and no logging policy.

Bunny Fonts helps you stay fully GPDR compliant and puts your users personal data in their own hands”.

You can also enjoy “lightning fast load times thanks to Bunny.net’s global CDN network to improve SEO which is search engine optimization and deliver a better user experience”.

In terms of using Bunny Fonts, in effect anywhere where it says “googleapis.com” could be replaced with “bunny.net” and they provide an example and the interface is very similar to Google.

And both methods we talked about previously whether linking to an external style sheet using the link HTML tag or using the CSS import rule is equally effective.

The second option as an external provider is using CloudFlare Fonts.

So CloudFlare is a global Content Delivery Network (CDN) which focuses around improving performance and elim: A global Content Delivery Network (CDN) that focuses on improving performance and eliminating privacy concerns from Googl Fonts usage.

You need to have the website provisioned through CloudFlare itself and there is a button which you can activate that automatically rewrites Google Font links and imports and serves them locally from within CloudFlare.

So in effect in effect acting as a barrier between the user and Google itself.

But this has many limitations.

It’s not as open as using Bunny Fonts which can be used anywhere versus for example having an account on CloudFlare itself for the same feature.

In both those examples we’re still relying on external party.

We’re also relying on trusting that party to say what they’re going to say.

What they’re going to do sorry what they’re going to say.

However there is no external validation.

There’s no way we can guarantee that they aren’t actually tracking things when they say they aren’t.

And just to be clear I’ve got no evidence to the contrary but it’s a thought process that we must undertake.

It’s a risk that we need to manage.

So the section the second broad option is self-hosting.

When the web font technology was incorporated into web browsers equally within that was support for a different font format called the Web Open Font Format or WOFF.

The more recent standard is WOFF2 and what that allows is web browsers are very efficiently load fonts from a particular host an internal host or it could be an external host.

But how do we go about having found a font we like on Google or Bunny into a self-hosting situation?

Well there are tools and technologies depending on your platform.

So within for example the RMarkdown world there are packages you can get for RStudio that does that for you.

But the one I want to highlight is something called “Google Web Font Helper”.

This is a service where you select the font you want again a similar interface to Google Fonts and it then provides a four-step process for self-hosting.

The salient salient steps for us are steps three and four.

So step three is about generating a Cascading Style Sheet (CSS) definition for font-face which is we can copy and paste into our project and the fourth step is then downloading a converted font file.

So it’s taken the font from Google or Bunny or whatever the equivalent is and converted it into the WOFF2 format ready for self-hosting and it’s it walks you through the steps necessary to do that.

So in that instance we do two things we take the zip file we unzip it into a directory of our choice we open up the Cascading Style Sheet (CSS) we copy and paste in the code we got from the Google Web Fonts Helper and then we style the HTML elements as we have done before and we end up with the same result right we have a web app or a web application that meets that provides a consistent experience for all our users.

It’s worthwhile noting that although this is probably the most practical approach to ensure a user has access to the fonts there are still those edge cases I mentioned previously in certain corporate environments where due to security or IT policies that might prevent the use of those fonts but nonetheless we have a privacy compliant way of providing a custom font.

So in practice what does that mean in terms of the three categories of websites we’ve talked about previously.

We now understand that we can customize static websites for documentation sites like MkDocs or DocuSaurus they have a feature for allowing custom style sheets so we can copy and paste the code we got from the Google Web Fonts Helper website into those areas.

If you’re using Plotly or Shiny they also support custom Cascading Style Sheets (CSS) computational platforms like Quarto and Jupyter Book can be customized in a similar way.

So I want to conclude by saying that in today’s episode hopefully we’ve illustrated the unusual reasons why it’s important to be able to customize fonts and the looks and the look of our communication medium which typically and increasingly are websites and web applications.

We’ve highlighted why it’s difficult to convey the designer’s intent if we’re relying on fonts installed on the user’s machines.

We’ve highlighted a common practice of linking or utilizing an external provider such as Google Fonts but there are associated privacy concerns.

We then moved on to discuss and address those concerns either using an external provider such as Bunny Fonts or CloudFlare fonts.

Alternatively how a developer may go about self-hosting those fonts and therefore mitigating some of these concerns.

Finally for all of the frameworks that I’ve discussed I’ve reinforced hopefully in your mind that they have the mechanism to allow you to define what that font should be and equally the colours that go with it and therefore hopefully as Data Explorers when you’re communicating the output of your analysis it can now be brand compliant which may seem silly but for some for a lot of people and for many stakeholders it can be a barrier and a distraction from understanding the message.

Attribution

Profile images based on:

Music and audio clips from:

Back to top

Citation

BibTeX citation:
@online{2024,
  author = {, miah0x41},
  title = {Podcast \#1: {Privacy} {Compliant} {Web} {Fonts}},
  date = {2024-05-02},
  url = {https://blog.curiodata.pro/posts/04-pod-episode1},
  langid = {en}
}
For attribution, please cite this work as:
miah0x41., “Podcast #1: Privacy Compliant Web Fonts,” May 02, 2024. Available: https://blog.curiodata.pro/posts/04-pod-episode1