Date   

Power-over-USB for KiwiSDR

Glenn Elmore
 

Attached is a PDF detailing what seems to have worked for me so far to let a Kiwi be run with only a USB connection to a WiFi router. Doing this avoids common mode current noise from LAN.
There may be errors in this but if you would care to, try it and if you have difficulties I'll try to help.

Glenn n6gn


Re: Version 3.0 is coming

Rob Robinett
 

Yes, I will share that list privately.
But I am weeks aways from alpha testing 3.0


Re: Version 3.0 is coming

WA2TP
 

Would you be able to list the call signs of the beta sites?

I would like to be able to account for changes in spot volume increases that are suggested by the new version of wsjtx which suggests a -2db s/n  improvement

TIA.

On Feb 20, 2021, at 1:10 PM, Rob Robinett <rob@...> wrote:

For those of you who don't join our Wednesday 18:00 UTC Zoom call, I thought it would be useful to let you know that I am actively working on a major new version of WD which will include these major new features:

1) Support for the new FST4W transmission modes added in WSJT-x V2.3.0, in particular the 5,15, and 30 minute packets.  Each band configured to receive them  will consume up to 100+ MBytes of space in /tmp/wsprdaemon 
2) Support for additional SDRs supported by the SoapyAPI service.  These include SDRPlay, RTL-SDR, Red Pitaya and many others.

It may be several weeks before I have a v3.0 release candidate running at the beta sites, but I wanted WD users that it is still under active development.


Version 3.0 is coming

Rob Robinett
 

For those of you who don't join our Wednesday 18:00 UTC Zoom call, I thought it would be useful to let you know that I am actively working on a major new version of WD which will include these major new features:

1) Support for the new FST4W transmission modes added in WSJT-x V2.3.0, in particular the 5,15, and 30 minute packets.  Each band configured to receive them  will consume up to 100+ MBytes of space in /tmp/wsprdaemon 
2) Support for additional SDRs supported by the SoapyAPI service.  These include SDRPlay, RTL-SDR, Red Pitaya and many others.

It may be several weeks before I have a v3.0 release candidate running at the beta sites, but I wanted WD users that it is still under active development.


Re: Using PgAdmin4 for Queries

Gwyn Griffiths
 

Greg
 Thank you for your notes on using PgAdmin4 - far from burning you at the stake I'm a little embarrassed - as I always have a PgAdmin4 window open for monitoring database usage on our three servers. I will add a new Annex to the next release of my notes.
73
Gwyn G3ZIL


Using VS Code for Queries

Greg Beam
 

Hello All,

In addition to my my PgAdmin4 Post, you can also use VS Code with the an PostgreSQL Explorer Extension to test your queries before coding them up into scripts or whatever it one needs to do.

Connection is similar to PgAdmin4, however, I created two different connections, 1) for the wsprnet db, and 2) for the tutorials db.

Again, this is just a shortcut to or alternative for using psql from the command line.

Vertex Latitude example from Gwyn via VS Code is attached

73's
Greg, KI7MT


Re: Using PgAdmin4 for Queries

Greg Beam
 

Here are the PgAdmin4 formatted queries for Section 3 and 4 of Gwyn's Database Notes.


Using PgAdmin4 for Queries

Greg Beam
 

Hello All,

Don't burn me at the stake, but, for work I have to use SQL Server Management Studio (SSMS) or Azure Data Studio for all things SQL Server / Data Warehouse. At home, and for most of my projects, I use PgAdmin4 and find it very helpful / useful on many fronts.

I was reading through the latest version of Gwyn's Database Notes and decided to test out some of the queries from the guide using PgAdmin4. Adding the server and connection string was a breeze, and the queries are very snappy.

So for those that are all thumbs on the command line, and just want to do some basic queries (or as advanced as you'd like), PgAdmin4 may be good alternative to pqsl command-line-mojo.

73's
Greg, KI7MT


Re: New to WSPR Daemon

Greg Beam
 
Edited

Hi Gwyn,

I've been slammed with web / domain transfer work the last few weeks and haven't had much time for radio related activity, though I did manage to splash out for a new rig: Elecraft => K4HD

Thanks for the additional links to your project objectives as that clarified a number of things.

Regarding PySpark - I used that because it was / is easy to get up and running. However, the real power / speed with Spark (IMHO) comes in with Scala. It's much faster as it's compiled rather than interrupted at runtime. It's also, like Java, Type-Safe, so there's little-to-no concern about mangling data types during one's process steps.

Pairing down (map reduce) larger data sets into fact tables is where I'll probably focus most of my efforts as that allows any number of rendering / graphing engines to easily consume the output. Getting the data into a usable structure is my first goal. After thinking on this more, I'll probably create an install-able Linux package that wraps each Scala Jar File Assembly with a bash script so it's easier for end users to call. This is what I'm pondering, but, I've not set anything into motion yet.

Re ClickHouse -  looks very interesting (and fast). That may be the fastest solution I've seen to date for consuming large chunks of data in a columnar format. I glanced through the docs, it looks to be very powerful indeed and definitely warrants a more thorough read.

73's
Greg, KI7MT


Re: Update to WsprDaemon Timescale Database Guide V2-1

Gwyn Griffiths
 
Edited

Following the posts to this Forum on data presentation and analysis by Andi Fugard using R and Greg Beam using PySpark and other 'Big Data' packages I have added summaries of their methods in an update to the WsprDaemon Timescale Database Guide. I'm grateful to them for permission to do so, and for expanding the range of tools available to work with WSPR data and I fully acknowledge, and provide links to, their work in the Guide.

Arne, who created the wspr.live website, which has tools to access a copy of the entire (yes entire) WSPR data set from inception, made contact with us and is now using the WsprDaemon wsprnet database to populate his database with new spots. The database behind wspr.live is Clickhouse and the new version of our Guide includes an introduction to Clickhouse including some comparative execution times. These show speed-up factors of between 9 and 61 against TimescaleDB.

Gwyn G3ZIL


Re: Example query and analysis using R

Gwyn Griffiths
 

Excellent, many thanks Andy,

I will write a section for the TimescaleDB Guide this week.
When working, I had R on a Mac for many years but did not touch it. I found JMP was enough for my needs.
But as part of this write-up I'll reinstall R and run your examples for myself.

73
Gwyn G3ZIL


Re: New to WSPR Daemon

Gwyn Griffiths
 
Edited

Hello Greg
On your recent points:
1. We have not discussed archival, our current offering is access to online, uncompressed data for an 11 year sunspot cycle, as we've described in a 2020 TAPR/ARRL Digital Communications Conference paper at https://files.tapr.org/meetings/DCC_2020/2020DCC_G3ZIL.pdf

2. To support that we have an Enterprise licence from TimescaleDB allowing automatic data tiering between main memory (192GB), SSD disk (550GB) and the 7TB RAID. Both are already pretty hefty ... See https://docs.timescale.com/latest/using-timescaledb/data-tiering

3. We're using 30 day 'chunks' in TimescaleDB jargon. Hence they can be variable in size. The current chunk is entirely in main memory.

4. There's an outline diagram at the bottom of the page at http://wsprdaemon.org/technical.html
    You'll see two Rob owned servers at independent sites. They take in data independently and provide resilience. There's also a third machine, a rented Digital Ocean Droplet with just the latest 7 days data to serve immediate, 'now' data needs should there be problems with both main servers.

5. Thanks for your comments on Aggregates - I'll post a comment when I have some results to share.

6. As for public APIs VK7JJ, WSPRWatch and Jim Lill's site at http://jimlill.com:8088/today_int.html already access WsprDaemon using three different methods (node.js, Swift, and bash/psql) and we've had a recent post in this forum on using R. This is how we would like to work - leaving the public facing interfaces to others.

7. My documentation at
http://wsprdaemon.org/ewExternalFiles/Timescale_wsprdaemon_database_queries_and_APIs_V2.pdf
currently provides detailed instructions for access for node.js, Python, bash/psql, KNIME and Octave and provides links for seven other methods. I'd envisage adding a detailed section on the method you intend to use when available, and I'll be adding a detailed section on R this coming week based on material from Andi on this forum.

best wishes
Gwyn G3ZIL


Re: New to WSPR Daemon

Greg Beam
 

Hi Rob, Gwyn,

Apologies for the spam, I somehow sent the message before I was done writing it (two many thumbs I guess)

In any case, I suspect your storage needs will differ substantially based on use cases. Like I was saying:

  • How much data do you want to provide on your real-time endpoints (hot storage) in the PostgreSQL DB's
  • How much, where and what format to use for long term archives (cold storage), gz, zip, parquet,avro, etc.

I've used Timescale some, but only for personal learning / testing, never in a production environment. The Aggregate Functions (continuous or triggered) looks to be a really cool feature. The materialized tables would be what I was referring to above as Fact Tables. I would interested in seeing how that works with a constant in-flow of data as Materalized Views in PostgreSQL can put a heavy load on servers with large datasets.

I would think, at some point, you'll need/want an API on the front end rather than going to the database directly for public users (could be wrong). That could help determine what Materialized Views (Aggregates) you want to provide via public API's and which you provide instruction to users for building their own datasets via cold storage files. Either way, It would take a Hefty PostgreSQL server to handle years of data at the scale you're forecasting here.

I saw the VK7JJ and WSPR Watch 3rd party interfaces. I don't know their implementation details (I suspect DB direct), so it's hard to say, but, Parquet files would not be a good solution for that type of dynamic need.

73's
Greg, KI7MT


Re: Example query and analysis using R

Andi Fugard M0INF
 

Hello again,

I have added more examples:
https://inductivestep.github.io/WSPR-analysis/

Best wishes,

Andi


Re: New to WSPR Daemon

Gwyn Griffiths
 

Rob, Greg
Rob - Continuous aggregates are on my to list of topics to add to the list of examples I have in my TimescaleDB Guide. I'll need to check whether they can be used with aggregates such as percentiles as well as the example aggregates provided by Timescale. Hourly counts and averages spring immediately to mind. One approach would be to figure out what would be a useful Grafana dashboard that only used aggregates.

Greg - Thank you for more thought-provoking points. I wonder if discussion during the weekly Wednesday WsprDaemon Zoom meetings that Rob holds might be useful? For me personally, I have learnt as I've implemented the WsprDaemon database with Influx then Timescale; not the best approach, but it is a good (if not the 'best') approach to data storage and serving data to users with a whole host of applications via a growing number of interfaces.

73
Gwyn G3ZIL


Re: New to WSPR Daemon

Rob Robinett
 

Hi,

I am very pleased to see this discussion.  A recent email from Timescale suggests that long time span queries can be accelerated by defining continuous aggregate tables.  I wonder if those would help us as our database grows?


Rob

On Sat, Jan 9, 2021 at 3:53 AM Greg Beam <ki7mt01@...> wrote:

Hi Gwyn,

A couple of redress points here.

Regarding On-Disk File Size(s)
I too was looking for a solution for this, which is why I looked toward Parquet / Avro as a solution. Both are binary file formats that have the schemas embedded with them. From them, you could derive your Fact Tables (things needed for plot rendering). Typically, you have a Master DataSet set containing all rows / columns, then create a sub-set fact tables, or in some cases, a separate Parquet DataSet that you serve your plots with. This could be any combination of the Master columns and rows. Reducing that set down to only whats needed for a particular plot can yield huge disk saving and read speed increases.

File Compression
Using Parquet / Avro file formats dramatically saves on long-term disk space usage. This is why I created the Pandas Parquet Compression Test. As you can see, the base file size was about 3.7 GB and Snappy Compression (the default Parquet compression) comes in at 667 MB, or roughly a 5 to 1 reduction. Gzip and Brotli come in a couple hundred MB smaller (440MB to 470 MB ish) if one is really crunched for disk space.

Read Speeds
With those high compression levels, I was concerned about read speeds, but that turned out to be a non-issue. During my PyArrow Read Tests, I was able to read 47+ million rows, do a grouby and count in =< 2.01 second with Snappy and Brotli. That's fast considering I was reading all rows and all columns. Read times would be much faster on a limited DataFrame either by tale of select columns.

In any case, there's lots of ways to clean this fish, but having a good idea of what your output needs will be, at least initially, can help define your back end source file strategy. While Databases certainly make it easy (initially) they aren't always the best long term solution with large datasets. I've been breaking my groups up into yearly blocks. The cool thing about parquet is, you can append to the storage rather easily. If I need multiple years, I just do two year group queries. You could add them all together, but, that can get really large and DataFrames need to fit into Memory, as that's where Spark does it's processing.

73's
Greg, KI7MT



--
Rob Robinett
AI6VN
mobile: +1 650 218 8896


Re: New to WSPR Daemon

Greg Beam
 

Hi Gwyn,

A couple of redress points here.

Regarding On-Disk File Size(s)
I too was looking for a solution for this, which is why I looked toward Parquet / Avro as a solution. Both are binary file formats that have the schemas embedded with them. From them, you could derive your Fact Tables (things needed for plot rendering). Typically, you have a Master DataSet set containing all rows / columns, then create a sub-set fact tables, or in some cases, a separate Parquet DataSet that you serve your plots with. This could be any combination of the Master columns and rows. Reducing that set down to only whats needed for a particular plot can yield huge disk saving and read speed increases.

File Compression
Using Parquet / Avro file formats dramatically saves on long-term disk space usage. This is why I created the Pandas Parquet Compression Test. As you can see, the base file size was about 3.7 GB and Snappy Compression (the default Parquet compression) comes in at 667 MB, or roughly a 5 to 1 reduction. Gzip and Brotli come in a couple hundred MB smaller (440MB to 470 MB ish) if one is really crunched for disk space.

Read Speeds
With those high compression levels, I was concerned about read speeds, but that turned out to be a non-issue. During my PyArrow Read Tests, I was able to read 47+ million rows, do a grouby and count in =< 2.01 second with Snappy and Brotli. That's fast considering I was reading all rows and all columns. Read times would be much faster on a limited DataFrame either by tale of select columns.

In any case, there's lots of ways to clean this fish, but having a good idea of what your output needs will be, at least initially, can help define your back end source file strategy. While Databases certainly make it easy (initially) they aren't always the best long term solution with large datasets. I've been breaking my groups up into yearly blocks. The cool thing about parquet is, you can append to the storage rather easily. If I need multiple years, I just do two year group queries. You could add them all together, but, that can get really large and DataFrames need to fit into Memory, as that's where Spark does it's processing.

73's
Greg, KI7MT


Re: New to WSPR Daemon

Gwyn Griffiths
 
Edited

Hello Greg
Thanks for the additional explanations and details. They give a useful picture of where you are with csv file data and the approaches that you take. It is clear that once past the first stage of getting the data columns you want the subsequent steps from wsprnet csv or wsprdaemon TimescaleDB files will be the same. As fields such as numerical lat and lon for tx and rx are already in WsprDaemon this may reduce the load at the analysis steps - that was our hope.

I am in no doubt that multithread (and cluster) approaches are needed with this data. WsprDaemon already has 390 million spots online (from July 2020 onward) and this can, and does, result in slow response to a number of queries and hence Grafana graphics. For now, in the scheme of things, being able to see a plot of spot count per hour of day for each day over six months in a few tens of seconds is still useful, and a marvel.

But, these 390 million spots are only taking up 138 GB of the 7TB disk space Rob has made available - so different approaches such as those you describe are going to be needed to look at data over a whole sunspot cycle that the WsprDaemon should be able to hold.

Thanks for permission to abstract from your posts on this topic for our TimescaleDB guide.

73
Gwyn G3ZIL


Re: New to WSPR Daemon

Greg Beam
 

Hi Gwyn,

I should probably clarify the use of these type of tools a bit more so as to not confuse folks. What I've added so far is targeting WSPRnet CSV files. I'll be adding the same or similar for WSPR Daemon Schemas.

Their primary purpose of Spark is to Map Reduce a given DataFrame / DataSet. Say, for example, you have a years worth of spot data and want to plot, compare, or otherwise process. The steps would go something like:

  • Select just the columns you want from the Parquet Partitions (timestamp, field1, field2, filed3, etc)
  • Perform the aggregations (SUM, AVG, Count, STATS, Compare, or whatever you need)
  • Plot or save the results to csv/json or post to a DB Fact Table.
At the plot or save stage is where the performance increase comes in as it's all done in parallel on a Spark cluster (standalone or nodes). While this doesn't sound overly impressive, it is. Consider the November 2020 WSPRnet CSV file. It has 70+ million rows of data * 15 Columns, When one adds the remainder of the year, you could easily be over 500 Million rows of data. Doing aggregate functions on datasets of that scale can be very expensive time wise. If one has 20 or so results they want process every day of every month down to the hour level in a rolling fashion, it becomes impractical to do in a single thread call.

I've not added any Streaming Functions, but, Spark also allows for continuous ingestion of data from file/folder monitoring, UDP ports, channels, and others. I can see many use-cases with WSPR Daemon and Spark Stream Processing of spot data from multiple Kiwi's with multi-channel monitoring on each device. You could use it to process the data, or simply post it to a staging table for downstream analytics. Staging data for downstream activity is a commonly used for things like server logs or Web-Page clicks from millions of users. However, it doesn't' matter what the source data is, only that it's coming in on intervals or continuously.

If you're into Machine Learning and predictive analytics, the Spark ML-Lib provides a powerful set of tools also.

Essentially, Spark provides (in clusters or stand alone modes)

- DataSet / Dataframe Map Reduction capabilities
- Stream Processing of data and files
- Machine Learning tests and predictive analytics

73's
Greg, KI7MT


Re: WD config errors

John
 

Thank you Rob.
All working well.

John
TI4JWC

341 - 360 of 501