Sport Data Resources

A page where I’m collecting useful links, references and resources.

Open Data

We may want to compare our data to a national picture, or combine external datasets with ours to enhance the data. Here’s some open data sets that might be useful.

Working with data

Dimensional Modelling

Dimensional Modelling is a way of thinking about data and structuring it to make it easier to use to answer questions. It is often mentioned in relation to building “data warehouses” but I think it’s a useful way to organise your data even if you’re the only person using it (for example in excel sheets, or as data frames in R/Python).

Dimensional modelling approaches also have common patterns for structuring data to deal with specific problems. For example (from a conversation I had with England Squash) you can track the changes in data over time (for example to compare a period this year with a period last year) using Slowly Changing Dimensions.

I like the book Agile Data Warehouse Design (Corr, Stagnitto).

Data Architecture

Facades

Sometimes you want to make a change to a part of your system. It’s typical to do this in a “big bang” fashion (e.g. build or commission a new system and then switch over to it on a defined date). This has risks - the new system might take a long time to deliver, meaning that changes in your organisation or the world could invalidate the system before you deploy it. Or it could be very complicated to migrate old data to the new system meaning that new users have to wait a long time to realise the benefits.

One possible solution to this is to introduce a Facade - a simpler bit of software that allows you to run the old and new systems in parallel but present a common interface. For example, if you’re commissioning a new membership system to replace an old one, you might introduce a facade so that, for example, your competition system can ask the question “is this person a member” and that question can be delegated to the old or new system as appropriate, hiding the complexity of the two systems. This allows you to gradually move the old members to the new system, while allowing new members to join using the new system. Once the old system is fully migrated it can be decommissioned, and the facade can potentially be removed.

Database choices

Most cloud vendors have “data warehouse” products (e.g. Amazon Redshift, Google BigQuery, Azure Synapse). Remember, these databases are designed to process huge volumes of data efficiently, and your organisation may not need that amount of power. The disadvantage of using a proprietary solution is that you are locked into a specific vendor (or third party supplier who uses that vendor). Most cloud platforms allow you to host open source databases (such as PostgreSQL) that may be a more appropriate choice for the data volumes you are working with and are more easily movable between vendors/hosts.