Edmonton Linux Users Group
https://groups.io/g/elug
<p>
The Edmonton Linux Users' Group is based in Edmonton
<a href="http://www.edmonton.ca/" target="_blank">Edmonton</a>
and open to anyone who is interested in or is running Linux.
Joining ELUG is simply a matter of signing up this ELUG group on groups.io.
</p>
<ul><li>ELUG is a good
place to ask questions or maybe just introduce yourself if you are<span style="color: rgb(66, 139, 202);"> </span>new to the Linux community.<br></li><li>There are no regularly scheduled meetings but we do invite you to connect with us on
<a href="http://www.meetup.com/Edmonton-Linux-User-Group/">Meetup.com</a>.
Any Linux or Open Source related topics are welcome.</li><li>An IRC channel has been registered at irc.freenode.net: #edmonton-lug</li></ul>Tue, 10 Mar 2015 11:46:52 -0700Re: Tests of distributions - Installing R (CRAN) packages
https://groups.io/g/elug/message/4213
<div dir="ltr"><div>It doesn't sound like you are working with many data points. But I should point out that R runs on memory. If you don't have enough RAM, it can choke. <br/></div><div><br/></div><div>If you think R is crashing, the first step is to run your R code from the R prompt or in an R IDE. If it's successful there, then R itself is not the problem.</div><div><br/></div><div>Jared<br/></div></div>jaredprins@... (Jared )Mon, 12 Nov 2018 08:26:24 -0800Re: Tests of distributions - Installing R (CRAN) packages
https://groups.io/g/elug/message/4212
On Sat, 10 Nov 2018 07:04:26 -0700<br />"Gordon Haverland" <ghaverla@...> wrote:<br /><blockquote>The data I am looking at, ...</blockquote>Assuming the AD test is the one I need, I started playing.<br /><br />I generated 2 vectors of Gaussian deviates of the same size with a<br />mean of 0.5 and a SD of 0.1. And it turns out that if you double the<br />number of deviates, the range sort of increases by a factor of 2. Sort<br />of. Maybe.<br /><br />I sorted the vector to manipulate.<br /><br />I did a AD test on the two same length vectors and kSamples produced<br />some kind of output. I then shifted and popped the first and last<br />samples off (which leaves the median at the same value), and did the<br />test again. Repeat, ...<br /><br />With 20 data points in the original data, by the time I had<br />shift/popped 5 times the AD test still wasn't seeing a significant<br />difference.<br /><br />Running with 40 data points in the original data, doing the shift/pop 4<br />times gets me to 32 data points (original range 0.43, new data range<br />0.18) which is just at the 5% threshold of being declared different.<br /><br />So, I am guessing that for my football (soccer) data, I really want at<br />least 40 data points in any "long" vector, and that I want my "short<br />vector" to be probably more than 20.<br /><br />A problem with looking for patterns in football data is typically not<br />enough data. It is not unusual for a game to end with 0, 1 or 2<br />goals. That games can be changed from a loss to a tie, a win to a<br />tie, ... by the issuing of a penalty (especially late in the game)<br />results in a lot of hard feelings.<br /><br />But the problem of insufficient data is all over problems. A farmer<br />would like to take in a single sample of 1 teaspoon of soil, for a<br />soil test. That would not take long, and is probably easy and cheap.<br />One small soil test isn't going to provide any useful information. You<br />need some large number of samples, and each sample needs to be larger<br />than a teaspoon. Larger samples, more cost. More samples, more cost.<br />So this is more squared.<br /><br />The cost problem is severely aggravated by the "charge what the market<br />will bare" model of pricing. That model of pricing, is predisposed to<br />ignoring the people who need low costs because of how they sample the<br />market. Once ignored, the MBAs determining costs never consider that<br />segment of the economy ever again. Unless someone starts a new company<br />looking to service this now ignored market segment, this part of the<br />market will continue to be ignored forever. But even when the market<br />changes (shifts) and becomes stable, the MBA need for ever increasing<br />income means that prices will tend to go up all the time. Because it<br />isn't enough to make a profit, you must have increasing profit from<br />year to year.<br /><br />What the market will bare.<br /><br />What we need is to put all these MBAs in a big cage with some very<br />hungry bears. And let the bears determine what the price should be.<br /><br />-- <br /><br />Gordghaverla@... (Gordon Haverland)Sun, 11 Nov 2018 16:14:03 -0800Re: Tests of distributions - Installing R (CRAN) packages
https://groups.io/g/elug/message/4211
The data I am looking at, is the distribution of possession time in<br />association football (soccer). In particular, the English Premier<br />League.<br /><br />In most of the big leagues, there is one or two dominant teams in the<br />top league of a country. The Bundesleague in Germany is mostly Bayern<br />Munich. LaLiga in Spain is mostly Real Madrid and Barcelona (sometimes<br />Atletico Madrid as well). And so on.<br /><br />For years, the EPL had a Top-4. Over the last couple of years, it<br />seems to have expanded into a Top-6. The other 14 teams, I refer to as<br />Rest Of The Pack.<br /><br />If a Top-6 team plays another Top-6 team, or a ROTP team plays a ROTP<br />team, you might see one team having 65% (or so) possession. But, if a<br />Top-6 team plays a ROTP team, so far this season the highest possession<br />was 81% (to the Top-6 team).<br /><br />The K-S test, is not sensitive to differences in range. So it is<br />inappropriate for my needs, as range is one place where there should be<br />differences. The Anderson-Darling (AD) test is supposed to be more<br />sensitive to the range.<br /><br />I don't know if the other tests in kSamples are appropriate. I am<br />having some problems understanding why some of the tests don't seem to<br />work (I am guessing that R is crashing, and that the pipe used to<br />communicate between R and Perl only end up holding the old content, and<br />so the next read returns the old data.<br /><br />-- <br /><br />Gordghaverla@... (Gordon Haverland)Sat, 10 Nov 2018 06:05:23 -0800Re: Robust fitting of data - Installing R (CRAN) packages
https://groups.io/g/elug/message/4210
On Fri, 9 Nov 2018 11:57:34 -0700<br />"Jared " <<a href="/profile/jared">@jared</a>> wrote:<br /><blockquote>Gord, that was the best thing I read all week: "Slapping the name<br />robust on something, doesn't mean it does what you think it does."</blockquote>Reading about when to _NOT_ use Anderson-Darling test, there was a<br />Google snippet that suggested a person could do "real statistics" with<br />Excel. I suspect if you got it to average one number, it would divide<br />by N-1 someplace. :-)<br /><br />-- <br /><br />Gordghaverla@... (Gordon Haverland)Fri, 09 Nov 2018 15:07:28 -0800Re: Robust fitting of data - Installing R (CRAN) packages
https://groups.io/g/elug/message/4209
On Fri, 9 Nov 2018 11:57:34 -0700<br />"Jared " <<a href="/profile/jared">@jared</a>> wrote:<br /><blockquote>Gord, that was the best thing I read all week: "Slapping the name<br />robust on something, doesn't mean it does what you think it does."<br /><br />I love it.</blockquote>Wonderful. :-)<br /><br />-- <br /><br />Gordghaverla@... (Gordon Haverland)Fri, 09 Nov 2018 11:44:22 -0800Re: Robust fitting of data - Installing R (CRAN) packages
https://groups.io/g/elug/message/4208
<div><div>Gord, that was the best thing I read all week: "<span style="font-family:sans-serif; font-size:12.8px;">Slapping the name robust on something, doesn't mean it does what you think</span><span style="font-family:sans-serif; font-size:12.8px;"> it does."</span><br/><br/>I love it.</div><div><br/>Jared<br/><br/><br/></div></div>jaredprins@... (Jared )Fri, 09 Nov 2018 10:57:47 -0800Re: Robust fitting of data - Installing R (CRAN) packages
https://groups.io/g/elug/message/4207
On Fri, 9 Nov 2018 11:36:29 -0700<br />"Jared " <<a href="/profile/jared">@jared</a>> wrote:<br /><blockquote>This sounds like regression through the origin?</blockquote>Yep.<br /><br /><blockquote>I think your degrees of freedom drop by one, which is fine as long as<br />your data set is not too small.</blockquote>I believe you drop one as well.<br /><br /><blockquote>With the intercept dropped, things are calculated differently or have<br />to be interpreted differently.<br /><br />Be careful.</blockquote>I think you need to be careful with just about anything robust.<br />Slapping the name robust on something, doesn't mean it does what you<br />think it does.<br /><br />But, I do like the idea that one has enough data, that you don't need<br />to concern yourself whether you have an odd or even number of data<br />points to calculate the median.<br /><br />On the distribution side, I am trying to learn about K-S type tests<br />from the R kSamples module. My data range is finite.<br /><br />-- <br /><br />Gordghaverla@... (Gordon Haverland)Fri, 09 Nov 2018 10:44:57 -0800Re: Robust fitting of data - Installing R (CRAN) packages
https://groups.io/g/elug/message/4206
<div>This sounds like regression through the origin?<div><br/></div><div>I think your degrees of freedom drop by one, which is fine as long as your data set is not too small.</div><div><br/></div><div>With the intercept dropped, things are calculated differently or have to be interpreted differently. </div><div><br/></div><div>Be careful.</div><div><br/></div><div>Jared</div></div><br/>jaredprins@... (Jared )Fri, 09 Nov 2018 10:36:44 -0800Re: Tests of distributions - Installing R (CRAN) packages
https://groups.io/g/elug/message/4205
<div>In R, modules are called packages. Some packages need to be compiled from source. Most are binary packages that can be uncompressed and the folder copied to your R library folder. But that's not the easiest way to install packages.<div><br/></div><div>If you haven't installed any libraries, then it won't have created your personal library yet. </div><div><br/></div><div><span style="font-family:sans-serif;">In bash type R and then at the R prompt type install.packages("kSamples"). It will do all the work for you.</span></div><div><span style="font-family:sans-serif;"><br/></span></div><div><span style="font-family:sans-serif;">It might ask you to create a personal library to store the binary package, unzipped.</span></div><div><span style="font-family:sans-serif;"><br/></span></div><div><span style="font-family:sans-serif;">Type .libPaths() and you should see the folder where libraries are. One for your personal packages and maybe one for R core packages.</span><br/></div><div><span style="font-family:sans-serif;"><br/></span></div><div><span style="font-family:sans-serif;">Jared</span></div></div><br/>jaredprins@... (Jared )Fri, 09 Nov 2018 10:20:47 -0800Re: Tests of distributions - Installing R (CRAN) packages
https://groups.io/g/elug/message/4204
On Thu, 8 Nov 2018 20:24:08 -0700<br />"Gordon Haverland" <ghaverla@...> wrote:<br /><blockquote>install.packages("kSamples")</blockquote>Talking to myself.<br /><br />I started a short perl script inside emacs with perldb, which has<br />use Statistics::R;<br />and creates the R "object".<br /><br />I created 2 vectors (lists) that were the same length in Perl, and then<br />'set' them in R, ask R to multiply them together, and then did a 'get'<br />of the result. The result printed fine in Perl (in the debugger).<br /><br />I then asked the R object to load the kSamples library.<br /><br />$R->run(q`library(kSamples)`);<br /><br />For those not familiar with Perl, there is a quoting mechanism involved<br />there (q or qq or others(?)). In this instance, I am quoting with<br />backticks.<br /><br />One of the particular tests in kSamples, is the Anderson-Darling test<br />(which is supposed to be a step or two up from the K-S test). And<br />running the example from the kSample project at github (or a copy of<br />it), I did compare the two vectors using the A-D test.<br /><br />Relatively painless. It is possible to run the test without directing<br />output anywhere. I am guessing this ends up in some default output<br />variable? but doing something like:<br />my $o2 = $R->run(q`ad.test(...)`);<br />captures the output as a text string into the variable $o2, which can<br />be printed directly.<br /><br />-- <br /><br />Gordghaverla@... (Gordon Haverland)Fri, 09 Nov 2018 08:11:02 -0800Re: Tests of distributions - Installing R (CRAN) packages
https://groups.io/g/elug/message/4203
Using su - to become root and cd'ing to root's home directory, I<br />started a "R" shell with the command "R". Which worked. I then issued<br />the command<br />install.packages("kSamples")<br /><br />This downloaded, compiled and installed things. As compiling was part<br />of this, source code must be somewhere. The screen output shows the<br />source code is someplace in /tmp, which means it will get deleted at<br />some point. Not really what I was expecting. I'm not sure if the<br />tarball is somewhere permanent. I have not run the newly installed<br />package yet. I didn't see anything which looked like running a test<br />suite against the package, to see that it works.<br /><br />-- <br />Gordghaverla@... (Gordon Haverland)Thu, 08 Nov 2018 19:24:13 -0800Re: Robust fitting of data - Installing R (CRAN) packages
https://groups.io/g/elug/message/4202
On Thu, 8 Nov 2018 15:58:06 -0700<br />"Gordon Haverland" <ghaverla@...> wrote:<br /><blockquote>Like a lot of things in statistics,</blockquote>This has nothing to do with comparing distributions, but is an example<br />of what computers can bring you.<br /><br />The mean is a measure of central tendency. It is not the only one.<br />The median is the value which is "half way", 50% is below and 50% is<br />above. For a single moded distribution, the mode is the most common<br />value.<br /><br />For symmetric distributions, the mean median and mode should all be<br />equal.<br /><br />Calculating means (averages, expectations) is the presence of outliers<br />results in answers different than should be found. It turns out the<br />median is a more robust measure of central tendency. If you calculate<br />the median in the presence of some (not a lot) of outliers, you<br />probably do much better than calculating averages.<br /><br />Numerical recipes has a function for doing a median fit of a straight<br />line to data. This is as opposed to a least squares fit.<br /><br />Let's say you have a data set, and you add one point to the data set.<br />And then you fit via least squares and you fit via a median method, and<br />you look at how the parameters of the fitted straight line change as a<br />function of where this extra point is (you are moving this extra point<br />around). You are probing the sensitivity of the calculated parameters<br />to the presence of this extra data point. The values found from<br />least squares, will vary smoothly with the position of this extra data<br />point. The values of the median fit will change discontinously as a<br />function of where this extra point is (there will be jumps in<br />parameters).<br /><br />A reasonable thing to do with any data set, is to calculate the average<br />X and Y of the data, and then make up a new data set where you subtract<br />(<X>,<Y>) from each data point. A least squares fit to this new data<br />will pass through (0,0). Normally we assume that there is no error in<br />X and hence all the error is in Y. But if we have a reasonable amount<br />of data, the "error" in moving the data by subtracting off the average<br />of X and Y should not be large. What we are left with, is just to<br />calculate the slope of the point that goes through (0,0).<br /><br />Well, there is a way to robustly solve that problem - the Theil-Sen<br />estimator.<br /><br /><a href="https://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator" target="_blank">https://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator</a><br /><br />What you do (in theory) is calculate all the 2 point slopes possible in<br />the data, sort them and pick the one in the middle (the median). The<br />number of slopes you have to calculate becomes ridiculously large as<br />the number of data points increases, so their are way to calculate<br />fewer slopes.<br /><br />Just in case you wanted to look at robust methods.<br /><br />-- <br /><br />Gordghaverla@... (Gordon Haverland)Thu, 08 Nov 2018 15:38:43 -0800Re: Tests of distributions - Installing R (CRAN) packages
https://groups.io/g/elug/message/4201
How do you test a distribution?<br /><br />Well you have a set of data. We start by sorting the data. The lowest<br />value has no values below it, so it gets the value (Xl,0). The highest<br />value has no values above it, so it gets the value of (Xh,1). All the<br />other data points are now (Xi,fraction of way between Xl and Xh).<br /><br />You now plot (Xi,Yi). In general, you get some kind of sigmoid (S<br />shaped) curve. It is monotone increasing.<br /><br />You could smooth that curve (if you think the distribution is smooth).<br />If you have reason to believe your data (X,Y) is exact, you could fit a<br />cubic spline to the data and specify that the slope at 0 is 0, and the<br />slope at 1 is 0. That will probably introduce a little wiggle to the<br />spline fit, since we really only have slopes of 0 at the extremities if<br />the X variable, and not necessarily at the extremities of our sampled<br />data. The cubic spline I was first taught, is fitted by solving a<br />linear system for all the data points at one time. This means a little<br />error in one data point affects all parameters calculated. Which often<br />leads to wiggle. Some splines are "localised", the Akima spline is one<br />such (family of) spline.<br /><br />If you know something about the error in your data, you could calculate<br />a smoothing spline through the data.<br /><br />In any event there are lots of choices as to how to analyze things.<br /><br />-- <br /><br />Gordghaverla@... (Gordon Haverland)Thu, 08 Nov 2018 15:38:33 -0800Tests of distributions - Installing R (CRAN) packages
https://groups.io/g/elug/message/4200
Like a lot of things in statistics, you cannot prove that two<br />distributions are the same. What you can show is the probability is<br />larger than something based on some metric you calculate.<br /><br />The F test compares variances. The T test compares means. An older<br />one for distributions, is the Kolmogorov–Smirnov test (K–S test or KS<br />test).<br /><br /><a href="https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test" target="_blank">https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test</a><br /><br />A nice thing about the K-S test, is that it is non-parametric. You can<br />use it for arbitrary distributions of data.<br /><br />Is it the best test? It's old, so probably not. Well, a little<br />looking around produced an astrophysics comment, which said that the<br />K-S test was bad for a bunch of well thought out reasons, and they<br />suggested that one use the Anderson-Darling test. This test was<br />invented in 1952. Which is probably old, so I went looking for<br />improvements on it. There was mention that Shapiro-Wilk was better.<br /><br />Well, it seems there is an R module (kSamples) which runs a few<br />different tests against data. There is a part of the tarball which<br />needs to be compiled.<br /><br />While I have installed modules for Perl in the past, I don't know if<br />how Debian sets up R allows for this to happen easily.<br /><br />Has anyone installed modules "manually" in R? How manual is it? Do<br />you really need to know what you are doing?<br /><br />Thanks<br /><br />-- <br /><br />Gordghaverla@... (Gordon Haverland)Thu, 08 Nov 2018 14:58:11 -0800Re: IBM Nears Deal to Acquire Software Maker Red Hat
https://groups.io/g/elug/message/4199
On Mon, 29 Oct 2018 09:07:13 -0400 (EDT)<br />"igoldberg1" <igoldberg1@...> wrote:<br /><blockquote>WHERE DID YOU HEAR THIS? THERE HAS BEEN NO NEWS OF THIS ANYWHERE ELSE.</blockquote>NPR.org has a version of this.<br /><br />-- <br /><br />Gordghaverla@... (Gordon Haverland)Mon, 29 Oct 2018 08:20:31 -0700Re: IBM Nears Deal to Acquire Software Maker Red Hat
https://groups.io/g/elug/message/4198
<div>They were talking about it this morning on 630 ched. 33 billion all cash deal is what they were reporting. </div>quilley.larry@... (Larry Quilley)Mon, 29 Oct 2018 06:56:11 -0700Re: IBM Nears Deal to Acquire Software Maker Red Hat
https://groups.io/g/elug/message/4197
<html><head><meta http-equiv="Content-Type"/>
</head><body>
<p>Try searching for this.</p>
<p>"red+hat+ibm" <br/>
</p>
<p>is a good search term.</p>
<p>Reuters</p>
<p>Bloomberg</p>
<p>et cetera</p>
<p>Or perhaps Red Hat itself:</p>
<p><a class="moz-txt-link-freetext" href="https://www.redhat.com/en/about/press-releases/ibm-acquire-red-hat-completely-changing-cloud-landscape-and-becoming-world%E2%80%99s-1-hybrid-cloud-provider" rel="nofollow">https://www.redhat.com/en/about/press-releases/ibm-acquire-red-hat-completely-changing-cloud-landscape-and-becoming-world%E2%80%99s-1-hybrid-cloud-provider</a></p>
<p><br/>
</p>
<br/>mhilarius@... (Maurice Hilarius)Mon, 29 Oct 2018 06:33:32 -0700Re: IBM Nears Deal to Acquire Software Maker Red Hat
https://groups.io/g/elug/message/4196
On Mon, Oct 29, 2018 at 8:07 AM igoldberg1 <igoldberg1@...> wrote:<br /><blockquote><br />WHERE DID YOU HEAR THIS? THERE HAS BEEN NO NEWS OF THIS ANYWHERE ELSE.<br /></blockquote>Not sure what you are specifically referring as no news available.<br />I do not watch the news and very rarely look for news online.<br />Yet I have run into both items mentioned earlier.<br />It is also quite easy to see where it would be in the best interests of the<br />principals for this news to not be widely touted - - - the repercussions<br />are sort of large.<br /><br />Regards<br /><br />Daraldo1bigtenor@... (o1bigtenor)Mon, 29 Oct 2018 06:27:27 -0700Re: IBM Nears Deal to Acquire Software Maker Red Hat
https://groups.io/g/elug/message/4195
<p style="font-size: 12pt; font-family: helvetica, arial, sans-serif; color: rgb(51, 51, 51);">WHERE DID YOU HEAR THIS? THERE HAS BEEN NO NEWS OF THIS ANYWHERE ELSE.<br/></p><p style="font-size: 12pt; font-family: helvetica, arial, sans-serif; color: rgb(51, 51, 51);"> iRA GOLDBERG<br/></p>igoldberg1@... (igoldberg1)Mon, 29 Oct 2018 06:07:18 -0700Re: IBM Nears Deal to Acquire Software Maker Red Hat
https://groups.io/g/elug/message/4194
<html><head><meta http-equiv="Content-Type"/>
</head><body>
<p>IBM is now worse than that.</p>
<p>Have you heard of the Phoenix payroll system paid for by the
Government of Canada?</p>
<p><a class="moz-txt-link-freetext" href="https://www.itworldcanada.com/article/phoenix-payroll-system-timeline-of-the-governments-problems/396407" rel="nofollow">https://www.itworldcanada.com/article/phoenix-payroll-system-timeline-of-the-governments-problems/396407</a></p>
<p><br/>
</p>
<p>IBM crapware and failure.</p>
<p><br/>
</p>
<p><br/>
</p>
<br/>mhilarius@... (Maurice Hilarius)Sun, 28 Oct 2018 18:44:49 -0700