Re: New to WSPR Daemon

Greg Beam <ki7mt01@...>

Hi Gwyn,

I should probably clarify the use of these type of tools a bit more so as to not confuse folks. What I've added so far is targeting WSPRnet CSV files. I'll be adding the same or similar for WSPR Daemon Schemas.

Their primary purpose of Spark is to Map Reduce a given DataFrame / DataSet. Say, for example, you have a years worth of spot data and want to plot, compare, or otherwise process. The steps would go something like:

  • Select just the columns you want from the Parquet Partitions (timestamp, field1, field2, filed3, etc)
  • Perform the aggregations (SUM, AVG, Count, STATS, Compare, or whatever you need)
  • Plot or save the results to csv/json or post to a DB Fact Table.
At the plot or save stage is where the performance increase comes in as it's all done in parallel on a Spark cluster (standalone or nodes). While this doesn't sound overly impressive, it is. Consider the November 2020 WSPRnet CSV file. It has 70+ million rows of data * 15 Columns, When one adds the remainder of the year, you could easily be over 500 Million rows of data. Doing aggregate functions on datasets of that scale can be very expensive time wise. If one has 20 or so results they want process every day of every month down to the hour level in a rolling fashion, it becomes impractical to do in a single thread call.

I've not added any Streaming Functions, but, Spark also allows for continuous ingestion of data from file/folder monitoring, UDP ports, channels, and others. I can see many use-cases with WSPR Daemon and Spark Stream Processing of spot data from multiple Kiwi's with multi-channel monitoring on each device. You could use it to process the data, or simply post it to a staging table for downstream analytics. Staging data for downstream activity is a commonly used for things like server logs or Web-Page clicks from millions of users. However, it doesn't' matter what the source data is, only that it's coming in on intervals or continuously.

If you're into Machine Learning and predictive analytics, the Spark ML-Lib provides a powerful set of tools also.

Essentially, Spark provides (in clusters or stand alone modes)

- DataSet / Dataframe Map Reduction capabilities
- Stream Processing of data and files
- Machine Learning tests and predictive analytics

Greg, KI7MT

Join to automatically receive all group messages.