Greg Beam <ki7mt01@...>
I should probably clarify the use of these type of tools a bit more so as to not confuse folks. What I've added so far is targeting WSPRnet CSV files. I'll be adding the same or similar for WSPR Daemon Schemas.
Their primary purpose of Spark is to Map Reduce a given DataFrame / DataSet. Say, for example, you have a years worth of spot data and want to plot, compare, or otherwise process. The steps would go something like:
I've not added any Streaming Functions, but, Spark also allows for continuous ingestion of data from file/folder monitoring, UDP ports, channels, and others. I can see many use-cases with WSPR Daemon and Spark Stream Processing of spot data from multiple Kiwi's with multi-channel monitoring on each device. You could use it to process the data, or simply post it to a staging table for downstream analytics. Staging data for downstream activity is a commonly used for things like server logs or Web-Page clicks from millions of users. However, it doesn't' matter what the source data is, only that it's coming in on intervals or continuously.
If you're into Machine Learning and predictive analytics, the Spark ML-Lib provides a powerful set of tools also.
Essentially, Spark provides (in clusters or stand alone modes)
- DataSet / Dataframe Map Reduction capabilities
- Stream Processing of data and files
- Machine Learning tests and predictive analytics