Managing edge data in the 5G era

(Image credit: Future)

In the IT sector, things tend to go in waves. Over the past few years, the emphasis on cloud computing has seen a lot of centralisation take place. From running our own data centres to using public cloud services, companies have been adopting cloud to make their applications scale and to reduce their costs.

However, at the same time, there has been an undercurrent of computing moving in the other direction. Rather than centralise, computing can move to individual devices that carry out specific tasks and that can’t handle the round trip between a device and the cloud. Edge computing is growing as more 5G use cases develop.

Whether it is the future connected car, supply chain operations directed based on data from sensors in shipping containers, or more straightforward Internet of Things devices, edge computing takes a different approach. 5G networks can support more edge devices connecting to the Internet in the same location, while also opening up the volume of data that can be shared.

From a data perspective, all these devices will be creating data all the time. So how should we go about storing and using that data over time, especially as 5G will make connectivity even more ubiquitous than it is today?

Data at the edge

"If each edge device is self-contained, then simple storage on the device itself can be used. However, for most enterprise use cases, the number of devices will be distributed and companies will want to see the status of all those devices over time."
Patrick Callaghan, DataStax.

Edge computing is fundamentally about serving a need locally to a user, whether that is taking data from sensors and then providing an automated response in real time, or carrying out some complex processing where latency would be too much of an issue. The latter category is particularly important for projects taking place around connected vehicles, where latency would affect safety.

Whatever size of project you have, managing the data you have created will be a requirement over time. There are several variables to consider - how much data is being created, how quickly that data creation takes place and what the data is used for over time. For anything above simple responses, that data will have to be stored.

If each edge device is self-contained, then simple storage on the device itself can be used. However, for most enterprise use cases, the number of devices will be distributed and companies will want to see the status of all those devices over time. This will therefore involve Internet connections and each device will have to provide data back to a central service.

To manage this at the edge, you can implement a way to capture and store all the data that is needed centrally, and then send the data back to the central service so it can be analysed. To architect this, there are two options: use separate approaches on the remote devices and the central service, or use the same approach across both.

Both approaches can have issues. Storing data locally then sending it back to a different service will involve lots of data extraction, transformation and load actions which can add a substantial overhead for the team in terms of management and updates. However, trying to run data management in a distributed environment has its own challenges, too. A fully distributed set of devices will need a fully distributed data strategy.

Similarly, if you intend to use data for analysis at scale then it should be structured. Gathering lots of data points together can be useful, but unstructured data is not that valuable on its own. Instead, you can consider using a time-series data approach when the order of events in sensor data will be valuable. Implementing a database can also help, as this will make it easier to store all the data over time. Ideally, you can consider a fully distributed database that can run across multiple locations to make it easier to organise and store all your edge computing data over time.

This is particularly important when you have devices that will update regularly, or where thousands or millions of data points will be created every day. Sorting and storing this data in a database will provide structure and make it more efficient to analyse over time.

Data in the middle

"As 5G makes edge computing more useful and allows more use cases to be delivered, the data involved will be more important to manage."
Patrick Callaghan, DataStax.

Alongside data on the device and in the centre, there is the process to get the data from those devices. There are some design and architecture choices to consider here. For example, from a bandwidth and power perspective, sending data in real time whenever something changes is more expensive than running scheduled transfers at regular intervals. If your use case relies on analysis of real time data as it is created, then this should be factored in from the beginning. Even as 5G is more powerful than previous networks, this additional capacity has its own overhead to consider.

An alternative here is scheduling data getting shared back from the edge devices. There are two approaches that you can take - per device, or per batch. Per device approaches will involve each device sharing a set of data at specific intervals back to a central location for organisation. The other approach is to gather data from a set of devices to a local collection point, which then takes care of sharing that data back to the central location. The intermediate phases can be very powerful when it comes to managing larger volumes of data coming from significant numbers of devices, or where the volume of data points is itself significant.

This intermediate approach involves either staging data sets for sharing on, or using the same data management approach across all the devices and data centres. Using a distributed database, you can store data on the device, the local data centre and the central data centre in the same way.

As 5G makes edge computing more useful and allows more use cases to be delivered, the data involved will be more important to manage. While each device might create its own data footprint, that information will have to be stored for analysis to provide value back to those that create it. Whatever your approach, you will have to consider how you manage this data over time. Distributed data approaches can help, whether you have a simple use case or a more complex one.

Patrick Callaghan has been a Technology and Business Strategist at DataStax for over five years. He works with companies implementing and supporting mission-critical applications across hybrid and multi-cloud environments. Prior to DataStax, Patrick held roles in the banking and finance architecting and developing real-time applications.