Out of Memory Error
jim.baird@...
Hi Mark
Many thanks, that seems to work fine. I think the only thing that has changed is that the “adopt” method now just takes the LR as a parameter (not the security info).
Best regards
Jim
From: gate-users@groups.io <gate-users@groups.io> On Behalf Of Mark Greenwood
Sent: 01 August 2018 19:26 To: gate-users@groups.io; jim.baird@... Subject: Re: [gate-users] Out of Memory Error
Looking at that code the problem is that you are loading all the documents into memory at the same time. Yes the documents need to be in a corpus, but no they don't need to be in memory. What you need to do is put the corpus into a SerialDatastore before you populate it. That will ensure the documents get saved to disc, and the controller will then load each in turn and then remove it from memory when done. There is example code for using a serial datastore at https://gate.ac.uk/wiki/code-repository/src/sheffield/examples/DataStoreApplication.java -- I think the code should still work although I don't know when it was last tested. Mark
On 01/08/18 19:04, jim.baird@... wrote:
|
|
Mark Greenwood
Looking at that code the problem is that you are loading all the documents into memory at the same time. Yes the documents need to be in a corpus, but no they don't need to be in memory. What you need to do is put the corpus into a SerialDatastore before you populate it. That will ensure the documents get saved to disc, and the controller will then load each in turn and then remove it from memory when done. There is example code for using a serial datastore at
https://gate.ac.uk/wiki/code-repository/src/sheffield/examples/DataStoreApplication.java
-- I think the code should still work although I don't know when
it was last tested. Mark On 01/08/18 19:04,
jim.baird@... wrote:
|
|
jim.baird@...
Mark, JD
Many thanks. The code is below (although there is other document pre-processing code where I create a Key set and add the class features but I am not sure this is useful other than to confirm that I am creating a lot of features).
I seem to recall reading in the materials that the Learning Framework works on all documents in the corpus simultaneously (rather than sequentially), but if it were possible to do some or all processing of documents in the corpus sequentially (and then remove them) that might have a 100 fold impact….? Otherwise, I was not clear on how I could delete resources while the ML was running. I have tried restarting and closing memory intensive applications.
My assumption was that the bottleneck was not in eclipse because that pauses while the ML runs in gate. I have increased eclipse to 3gb and still get the Out of Memory message.
Best regards
Jim
import java.io.File; import java.io.FileFilter; import java.io.IOException; import java.lang.reflect.InvocationTargetException;
import javax.swing.SwingUtilities;
import gate.Corpus; import gate.CorpusController; import gate.Factory; import gate.Gate; import gate.creole.ResourceInstantiationException; import gate.gui.MainFrame; import gate.persist.PersistenceException; import gate.util.GateException; import gate.util.persistence.PersistenceManager;
public class TrainClassifier {
public void trainClassifier() throws IOException, GateException, InvocationTargetException, InterruptedException {
System.out.println("Loading Learning Framework"); Gate.init(); SwingUtilities.invokeAndWait(() -> MainFrame.getInstance().setVisible(true));
// Load application from saved state System.out.println("Loading Training Module"); CorpusController trainclass = (CorpusController) PersistenceManager.loadObjectFromFile(new File("C:\\Users\\....xgapp"));
// create and populate test corpus Corpus traincorpus = Factory.newCorpus("Training corpus"); FileFilter acceptAllFileFilter = new FileFilter() { public boolean accept(File filepath) { return true; } }; File corpusDirectory = new File ("C:/Users/…/TestTrainSetAllTypes"); String documentEncoding = "UTF-8"; traincorpus.populate(corpusDirectory.toURI().toURL(), acceptAllFileFilter, documentEncoding , false);
// set the corpus for the classifier trainclass.setCorpus(traincorpus);
// run classifier trainclass.execute();
} // trainClassifier } // class
From: gate-users@groups.io <gate-users@groups.io> On Behalf Of Mark Greenwood
It's difficult to know exactly what the problem is without seeing your application but that error message is slightly misleading. It's not that you've run out of memory but that you are creating lots of objects and garbage collection can't keep up - it's spending more time trying to free memory then it is doing actual work. If you are running code from within eclipse are you sure you are correctly deleting gate resources once you've finished with them? Do you have code etc. you can share where we can see what might be the problem,
Mark
On Tue, 31 Jul 2018, 15:37 Jan Dedek, <dedekj@...> wrote:
|
|
Mark Greenwood
It's difficult to know exactly what the problem is without seeing your application but that error message is slightly misleading. It's not that you've run out of memory but that you are creating lots of objects and garbage collection can't keep up - it's spending more time trying to free memory then it is doing actual work. If you are running code from within eclipse are you sure you are correctly deleting gate resources once you've finished with them? Do you have code etc. you can share where we can see what might be the problem, Mark On Tue, 31 Jul 2018, 15:37 Jan Dedek, <dedekj@...> wrote:
|
|
Jan Dedek
Hi Jim, I think that Gate.l4j.ini file is only for the GATE GUI. If you are running your own new Java application built in Eclipse, you need to add the -Xmx8G param to this application. See e.g. this article for eclipse: http://planetofbits.com/eclipse/increase-jvm-heap-size-in-eclipse/ JD út 31. 7. 2018 v 16:02 odesílatel <jim.baird@...> napsal: Hi |
|
jim.baird@...
Hi
I am running a Gate ML Classification Training PR as part of a new Java application built in Eclipse. I have successfully run the PR on a smaller training set, but the accuracy was very low and had a strong bias to one of the class types. I have increased the size of the training sets and have roughly equal numbers of each class in the training set (to remove bias). I have increased the Gate.l4j.ini levels to -Xmx8G and -Xms1200m (my pc only has 8GB) looking at the task manager, the total RAM usage during the process hovers just above 2GB. The size of all the documents is 54Mb, about 140 documents in total. My instances are each document and the features are the TYPE - token, FEATURE - string within each document. I am getting the error below. I am wondering if I am simply exceeding the memory capacity because of the number of features / size of training set but the task manager suggests otherwise. It would be great to get any suggestions as to how I could avoid the limit or if there is another memory setting I need to modify. Regards Jim An exception occurred during processing of documents, no training will be done Exception was class java.lang.OutOfMemoryError: GC overhead limit exceeded Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3181) at java.util.ArrayList.grow(ArrayList.java:265) at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:239) at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:231) at java.util.ArrayList.add(ArrayList.java:462) at gate.util.SimpleSortedSet.add(SimpleSortedSet.java:88) at gate.jape.SinglePhaseTransducer.addAnnotationsByOffset(SinglePhaseTransducer.java:232) at gate.jape.SinglePhaseTransducer.transduce(SinglePhaseTransducer.java:262) at gate.jape.MultiPhaseTransducer.transduce(MultiPhaseTransducer.java:188) at gate.jape.Batch.transduce(Batch.java:203) at gate.creole.Transducer.execute(Transducer.java:177) at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291) at gate.creole.tokeniser.DefaultTokeniser.execute(DefaultTokeniser.java:186) at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291) at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:172) at gate.creole.SerialController.executeImpl(SerialController.java:157) at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:225) at gate.creole.ConditionalSerialAnalyserController.execute(ConditionalSerialAnalyserController.java:132) at octopus1_1.TrainClassifier.trainClassifier(TrainClassifier.java:55) at octopus1_1.Main.main(Main.java:38) |
|