Data Supplies
Posted on November 13, 2011
If your organisation is receiving data supplies from a vendor or perhaps data that is being shared by another organisation, it is worth considering what changes you could be making to ensure that the data is as useful as it can be. Data that is supplied often comes with a lot of assumptions. Because it needs to be versatile for many organisations it often contains intricacies that are less than useful or even unhelpful. It may have come from a legacy system that has a flawed design or is formatted for a commonly used target legacy system.
There is a cost that your organisation is exposed to for handling this data. Firstly, it may place limitations on how you can use the data or what you can use it for. Secondly, any workarounds that you design to use the data effectively have to be re-performed every time you want to use the data, which is inefficient.
You can choose to remodel and simplify the data, designing a geospatial repository that meets your needs. It can also means designing a transformation that can be performed repeatedly. An example might be to take Road Centreline data, and split into 3 sets, Navigable Road Centrelines, Tracks and a Separate State Highway layer. Doing this has a number of advantages. It ultimately saves time and will allow you to easily do what was difficult or unworkable before. Every time you go to use it, the data will be easier to use so you are more likely to make use of it.
By pre-processing in this way, it has the advantage of removing bottlenecks. If you have already transformed your data, you have reduced much of the work needed for translating your data through to your production systems and had the opportunity to identify errors or inconsistencies with the data at the time of receipt, not necessarily when the pressure is on to produce outputs using it. Ensure that you have a good relationship with your provider by actively providing feedback in the most useful way possible for them. Only code in data fixes as a last resort, it is much better to feed data problems back to your supplier for them to fix. Hard coding data fixes can come unstuck if the Supplier corrects the data and you are still applying fixes to it. Also having a good relationship and a data supply agreement means that you should expect a consistent data supply. If this is not happening then negotiate one with your supplier.
There are exceptions to most rules, and one area where you may choose not to update your model is if you are detecting changed data and only applying the changes. In such cases, consideration needs to be given to ensuring that a data model change does not make this a more difficult task.