Out of Memory Error


jim.baird@...
 

Hi Mark

 

Many thanks, that seems to work fine. I think the only thing that has changed is that the “adopt” method now just takes the LR as a parameter (not the security info).

 

Best regards

 

Jim

 

 

From: gate-users@groups.io <gate-users@groups.io> On Behalf Of Mark Greenwood
Sent: 01 August 2018 19:26
To: gate-users@groups.io; jim.baird@...
Subject: Re: [gate-users] Out of Memory Error

 

Looking at that code the problem is that you are loading all the documents into memory at the same time. Yes the documents need to be in a corpus, but no they don't need to be in memory. What you need to do is put the corpus into a SerialDatastore before you populate it. That will ensure the documents get saved to disc, and the controller will then load each in turn and then remove it from memory when done.

There is example code for using a serial datastore at https://gate.ac.uk/wiki/code-repository/src/sheffield/examples/DataStoreApplication.java -- I think the code should still work although I don't know when it was last tested.

Mark

 

On 01/08/18 19:04, jim.baird@... wrote:

Mark, JD

 

Many thanks. The code is below (although there is other document pre-processing code where I create a Key set and add the class features but I am not sure this is useful other than to confirm that I am creating a lot of features).

 

I seem to recall reading in the materials that the Learning Framework works on all documents in the corpus simultaneously (rather than sequentially), but if it were possible to do some or all processing of documents in the corpus sequentially (and then remove them) that might have a 100 fold impact….? Otherwise, I was not clear on how I could delete resources while the ML was running. I have tried restarting and closing memory intensive applications.

 

My assumption was that the bottleneck was not in eclipse because that pauses while the ML runs in gate. I have increased eclipse to 3gb and still get the Out of Memory message.

 

Best regards

 

Jim

 

import java.io.File;

import java.io.FileFilter;

import java.io.IOException;

import java.lang.reflect.InvocationTargetException;

 

import javax.swing.SwingUtilities;

 

import gate.Corpus;

import gate.CorpusController;

import gate.Factory;

import gate.Gate;

import gate.creole.ResourceInstantiationException;

import gate.gui.MainFrame;

import gate.persist.PersistenceException;

import gate.util.GateException;

import gate.util.persistence.PersistenceManager;

 

public class TrainClassifier {

             

              public void trainClassifier() throws IOException, GateException, InvocationTargetException, InterruptedException {

 

              System.out.println("Loading Learning Framework");

              Gate.init();

              SwingUtilities.invokeAndWait(() -> MainFrame.getInstance().setVisible(true));

             

              // Load application from saved state

              System.out.println("Loading Training Module");

              CorpusController trainclass = (CorpusController)

                                           PersistenceManager.loadObjectFromFile(new File("C:\\Users\\....xgapp"));

             

              // create and populate test corpus

              Corpus traincorpus = Factory.newCorpus("Training corpus");

              FileFilter acceptAllFileFilter = new FileFilter() {

                             public boolean accept(File filepath) {

                  return true;

                }

              };

              File corpusDirectory = new File ("C:/Users/…/TestTrainSetAllTypes");

              String documentEncoding = "UTF-8";

              traincorpus.populate(corpusDirectory.toURI().toURL(), acceptAllFileFilter, documentEncoding , false);

             

              // set the corpus for the classifier

              trainclass.setCorpus(traincorpus);

             

              // run classifier

              trainclass.execute();

             

              } // trainClassifier

} // class

 

From: gate-users@groups.io <gate-users@groups.io> On Behalf Of Mark Greenwood
Sent: 31 July 2018 15:43
To:
gate-users@groups.io
Subject: Re: [gate-users] Out of Memory Error

 

It's difficult to know exactly what the problem is without seeing your application but that error message is slightly misleading. It's not that you've run out of memory but that you are creating lots of objects and garbage collection can't keep up - it's spending more time trying to free memory then it is doing actual work. If you are running code from within eclipse are you sure you are correctly deleting gate resources once you've finished with them? Do you have code etc. you can share where we can see what might be the problem,

 

Mark

 

On Tue, 31 Jul 2018, 15:37 Jan Dedek, <dedekj@...> wrote:

Hi Jim,

I think that Gate.l4j.ini
 file is only for the GATE GUI. If you are running your own new Java application built in Eclipse, you need to add the  -Xmx8G param to this application.

See e.g. this article for eclipse: http://planetofbits.com/eclipse/increase-jvm-heap-size-in-eclipse/

JD

 

út 31. 7. 2018 v 16:02 odesílatel <jim.baird@...> napsal:

Hi 

I am running a Gate ML Classification Training PR as part of a new Java application built in Eclipse. 

I have successfully run the PR on a smaller training set, but the accuracy was very low and had a strong bias to one of the class types. 

I have increased the size of the training sets and have roughly equal numbers of each class in the training set (to remove bias).

I have increased the Gate.l4j.ini levels to -Xmx8G and -Xms1200m (my pc only has 8GB)

looking at the task manager, the total RAM usage during the process hovers just above 2GB. The size of all the documents is 54Mb, about 140 documents in total. My instances are each document and the features are the TYPE - token, FEATURE - string within each document.

I am getting the error below. 

I am wondering if I am simply exceeding the memory capacity because of the number of features / size of training set but the task manager suggests otherwise. 

It would be great to get any suggestions as to how I could avoid the limit or if there is another memory setting I need to modify.

Regards

Jim



An exception occurred during processing of documents, no training will be done
Exception was class java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
 at java.util.Arrays.copyOf(Arrays.java:3181)
 at java.util.ArrayList.grow(ArrayList.java:265)
 at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:239)
 at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:231)
 at java.util.ArrayList.add(ArrayList.java:462)
 at gate.util.SimpleSortedSet.add(SimpleSortedSet.java:88)
 at gate.jape.SinglePhaseTransducer.addAnnotationsByOffset(SinglePhaseTransducer.java:232)
 at gate.jape.SinglePhaseTransducer.transduce(SinglePhaseTransducer.java:262)
 at gate.jape.MultiPhaseTransducer.transduce(MultiPhaseTransducer.java:188)
 at gate.jape.Batch.transduce(Batch.java:203)
 at gate.creole.Transducer.execute(Transducer.java:177)
 at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
 at gate.creole.tokeniser.DefaultTokeniser.execute(DefaultTokeniser.java:186)
 at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
 at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:172)
 at gate.creole.SerialController.executeImpl(SerialController.java:157)
 at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:225)
 at gate.creole.ConditionalSerialAnalyserController.execute(ConditionalSerialAnalyserController.java:132)
 at octopus1_1.TrainClassifier.trainClassifier(TrainClassifier.java:55)
 at octopus1_1.Main.main(Main.java:38)

 


Mark Greenwood
 

Looking at that code the problem is that you are loading all the documents into memory at the same time. Yes the documents need to be in a corpus, but no they don't need to be in memory. What you need to do is put the corpus into a SerialDatastore before you populate it. That will ensure the documents get saved to disc, and the controller will then load each in turn and then remove it from memory when done.

There is example code for using a serial datastore at https://gate.ac.uk/wiki/code-repository/src/sheffield/examples/DataStoreApplication.java -- I think the code should still work although I don't know when it was last tested.

Mark


On 01/08/18 19:04, jim.baird@... wrote:

Mark, JD

 

Many thanks. The code is below (although there is other document pre-processing code where I create a Key set and add the class features but I am not sure this is useful other than to confirm that I am creating a lot of features).

 

I seem to recall reading in the materials that the Learning Framework works on all documents in the corpus simultaneously (rather than sequentially), but if it were possible to do some or all processing of documents in the corpus sequentially (and then remove them) that might have a 100 fold impact….? Otherwise, I was not clear on how I could delete resources while the ML was running. I have tried restarting and closing memory intensive applications.

 

My assumption was that the bottleneck was not in eclipse because that pauses while the ML runs in gate. I have increased eclipse to 3gb and still get the Out of Memory message.

 

Best regards

 

Jim

 

import java.io.File;

import java.io.FileFilter;

import java.io.IOException;

import java.lang.reflect.InvocationTargetException;

 

import javax.swing.SwingUtilities;

 

import gate.Corpus;

import gate.CorpusController;

import gate.Factory;

import gate.Gate;

import gate.creole.ResourceInstantiationException;

import gate.gui.MainFrame;

import gate.persist.PersistenceException;

import gate.util.GateException;

import gate.util.persistence.PersistenceManager;

 

public class TrainClassifier {

             

              public void trainClassifier() throws IOException, GateException, InvocationTargetException, InterruptedException {

 

              System.out.println("Loading Learning Framework");

              Gate.init();

              SwingUtilities.invokeAndWait(() -> MainFrame.getInstance().setVisible(true));

             

              // Load application from saved state

              System.out.println("Loading Training Module");

              CorpusController trainclass = (CorpusController)

                                           PersistenceManager.loadObjectFromFile(new File("C:\\Users\\....xgapp"));

             

              // create and populate test corpus

              Corpus traincorpus = Factory.newCorpus("Training corpus");

              FileFilter acceptAllFileFilter = new FileFilter() {

                             public boolean accept(File filepath) {

                  return true;

                }

              };

              File corpusDirectory = new File ("C:/Users/…/TestTrainSetAllTypes");

              String documentEncoding = "UTF-8";

              traincorpus.populate(corpusDirectory.toURI().toURL(), acceptAllFileFilter, documentEncoding , false);

             

              // set the corpus for the classifier

              trainclass.setCorpus(traincorpus);

             

              // run classifier

              trainclass.execute();

             

              } // trainClassifier

} // class

 

From: gate-users@groups.io <gate-users@groups.io> On Behalf Of Mark Greenwood
Sent: 31 July 2018 15:43
To: gate-users@groups.io
Subject: Re: [gate-users] Out of Memory Error

 

It's difficult to know exactly what the problem is without seeing your application but that error message is slightly misleading. It's not that you've run out of memory but that you are creating lots of objects and garbage collection can't keep up - it's spending more time trying to free memory then it is doing actual work. If you are running code from within eclipse are you sure you are correctly deleting gate resources once you've finished with them? Do you have code etc. you can share where we can see what might be the problem,

 

Mark

 

On Tue, 31 Jul 2018, 15:37 Jan Dedek, <dedekj@...> wrote:

Hi Jim,

I think that Gate.l4j.ini
 file is only for the GATE GUI. If you are running your own new Java application built in Eclipse, you need to add the  -Xmx8G param to this application.

See e.g. this article for eclipse: http://planetofbits.com/eclipse/increase-jvm-heap-size-in-eclipse/

JD

 

út 31. 7. 2018 v 16:02 odesílatel <jim.baird@...> napsal:

Hi 

I am running a Gate ML Classification Training PR as part of a new Java application built in Eclipse. 

I have successfully run the PR on a smaller training set, but the accuracy was very low and had a strong bias to one of the class types. 

I have increased the size of the training sets and have roughly equal numbers of each class in the training set (to remove bias).

I have increased the Gate.l4j.ini levels to -Xmx8G and -Xms1200m (my pc only has 8GB)

looking at the task manager, the total RAM usage during the process hovers just above 2GB. The size of all the documents is 54Mb, about 140 documents in total. My instances are each document and the features are the TYPE - token, FEATURE - string within each document.

I am getting the error below. 

I am wondering if I am simply exceeding the memory capacity because of the number of features / size of training set but the task manager suggests otherwise. 

It would be great to get any suggestions as to how I could avoid the limit or if there is another memory setting I need to modify.

Regards

Jim



An exception occurred during processing of documents, no training will be done
Exception was class java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
 at java.util.Arrays.copyOf(Arrays.java:3181)
 at java.util.ArrayList.grow(ArrayList.java:265)
 at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:239)
 at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:231)
 at java.util.ArrayList.add(ArrayList.java:462)
 at gate.util.SimpleSortedSet.add(SimpleSortedSet.java:88)
 at gate.jape.SinglePhaseTransducer.addAnnotationsByOffset(SinglePhaseTransducer.java:232)
 at gate.jape.SinglePhaseTransducer.transduce(SinglePhaseTransducer.java:262)
 at gate.jape.MultiPhaseTransducer.transduce(MultiPhaseTransducer.java:188)
 at gate.jape.Batch.transduce(Batch.java:203)
 at gate.creole.Transducer.execute(Transducer.java:177)
 at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
 at gate.creole.tokeniser.DefaultTokeniser.execute(DefaultTokeniser.java:186)
 at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
 at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:172)
 at gate.creole.SerialController.executeImpl(SerialController.java:157)
 at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:225)
 at gate.creole.ConditionalSerialAnalyserController.execute(ConditionalSerialAnalyserController.java:132)
 at octopus1_1.TrainClassifier.trainClassifier(TrainClassifier.java:55)
 at octopus1_1.Main.main(Main.java:38)



jim.baird@...
 

Mark, JD

 

Many thanks. The code is below (although there is other document pre-processing code where I create a Key set and add the class features but I am not sure this is useful other than to confirm that I am creating a lot of features).

 

I seem to recall reading in the materials that the Learning Framework works on all documents in the corpus simultaneously (rather than sequentially), but if it were possible to do some or all processing of documents in the corpus sequentially (and then remove them) that might have a 100 fold impact….? Otherwise, I was not clear on how I could delete resources while the ML was running. I have tried restarting and closing memory intensive applications.

 

My assumption was that the bottleneck was not in eclipse because that pauses while the ML runs in gate. I have increased eclipse to 3gb and still get the Out of Memory message.

 

Best regards

 

Jim

 

import java.io.File;

import java.io.FileFilter;

import java.io.IOException;

import java.lang.reflect.InvocationTargetException;

 

import javax.swing.SwingUtilities;

 

import gate.Corpus;

import gate.CorpusController;

import gate.Factory;

import gate.Gate;

import gate.creole.ResourceInstantiationException;

import gate.gui.MainFrame;

import gate.persist.PersistenceException;

import gate.util.GateException;

import gate.util.persistence.PersistenceManager;

 

public class TrainClassifier {

             

              public void trainClassifier() throws IOException, GateException, InvocationTargetException, InterruptedException {

 

              System.out.println("Loading Learning Framework");

              Gate.init();

              SwingUtilities.invokeAndWait(() -> MainFrame.getInstance().setVisible(true));

             

              // Load application from saved state

              System.out.println("Loading Training Module");

              CorpusController trainclass = (CorpusController)

                                           PersistenceManager.loadObjectFromFile(new File("C:\\Users\\....xgapp"));

             

              // create and populate test corpus

              Corpus traincorpus = Factory.newCorpus("Training corpus");

              FileFilter acceptAllFileFilter = new FileFilter() {

                             public boolean accept(File filepath) {

                  return true;

                }

              };

              File corpusDirectory = new File ("C:/Users/…/TestTrainSetAllTypes");

              String documentEncoding = "UTF-8";

              traincorpus.populate(corpusDirectory.toURI().toURL(), acceptAllFileFilter, documentEncoding , false);

             

              // set the corpus for the classifier

              trainclass.setCorpus(traincorpus);

             

              // run classifier

              trainclass.execute();

             

              } // trainClassifier

} // class

 

From: gate-users@groups.io <gate-users@groups.io> On Behalf Of Mark Greenwood
Sent: 31 July 2018 15:43
To: gate-users@groups.io
Subject: Re: [gate-users] Out of Memory Error

 

It's difficult to know exactly what the problem is without seeing your application but that error message is slightly misleading. It's not that you've run out of memory but that you are creating lots of objects and garbage collection can't keep up - it's spending more time trying to free memory then it is doing actual work. If you are running code from within eclipse are you sure you are correctly deleting gate resources once you've finished with them? Do you have code etc. you can share where we can see what might be the problem,

 

Mark

 

On Tue, 31 Jul 2018, 15:37 Jan Dedek, <dedekj@...> wrote:

Hi Jim,

I think that Gate.l4j.ini
 file is only for the GATE GUI. If you are running your own new Java application built in Eclipse, you need to add the  -Xmx8G param to this application.

See e.g. this article for eclipse: http://planetofbits.com/eclipse/increase-jvm-heap-size-in-eclipse/

JD

 

út 31. 7. 2018 v 16:02 odesílatel <jim.baird@...> napsal:

Hi 

I am running a Gate ML Classification Training PR as part of a new Java application built in Eclipse. 

I have successfully run the PR on a smaller training set, but the accuracy was very low and had a strong bias to one of the class types. 

I have increased the size of the training sets and have roughly equal numbers of each class in the training set (to remove bias).

I have increased the Gate.l4j.ini levels to -Xmx8G and -Xms1200m (my pc only has 8GB)

looking at the task manager, the total RAM usage during the process hovers just above 2GB. The size of all the documents is 54Mb, about 140 documents in total. My instances are each document and the features are the TYPE - token, FEATURE - string within each document.

I am getting the error below. 

I am wondering if I am simply exceeding the memory capacity because of the number of features / size of training set but the task manager suggests otherwise. 

It would be great to get any suggestions as to how I could avoid the limit or if there is another memory setting I need to modify.

Regards

Jim



An exception occurred during processing of documents, no training will be done
Exception was class java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
 at java.util.Arrays.copyOf(Arrays.java:3181)
 at java.util.ArrayList.grow(ArrayList.java:265)
 at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:239)
 at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:231)
 at java.util.ArrayList.add(ArrayList.java:462)
 at gate.util.SimpleSortedSet.add(SimpleSortedSet.java:88)
 at gate.jape.SinglePhaseTransducer.addAnnotationsByOffset(SinglePhaseTransducer.java:232)
 at gate.jape.SinglePhaseTransducer.transduce(SinglePhaseTransducer.java:262)
 at gate.jape.MultiPhaseTransducer.transduce(MultiPhaseTransducer.java:188)
 at gate.jape.Batch.transduce(Batch.java:203)
 at gate.creole.Transducer.execute(Transducer.java:177)
 at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
 at gate.creole.tokeniser.DefaultTokeniser.execute(DefaultTokeniser.java:186)
 at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
 at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:172)
 at gate.creole.SerialController.executeImpl(SerialController.java:157)
 at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:225)
 at gate.creole.ConditionalSerialAnalyserController.execute(ConditionalSerialAnalyserController.java:132)
 at octopus1_1.TrainClassifier.trainClassifier(TrainClassifier.java:55)
 at octopus1_1.Main.main(Main.java:38)


Mark Greenwood
 

It's difficult to know exactly what the problem is without seeing your application but that error message is slightly misleading. It's not that you've run out of memory but that you are creating lots of objects and garbage collection can't keep up - it's spending more time trying to free memory then it is doing actual work. If you are running code from within eclipse are you sure you are correctly deleting gate resources once you've finished with them? Do you have code etc. you can share where we can see what might be the problem,

Mark

On Tue, 31 Jul 2018, 15:37 Jan Dedek, <dedekj@...> wrote:
Hi Jim,

I think that Gate.l4j.ini
 file is only for the GATE GUI. If you are running your own new Java application built in Eclipse, you need to add the  -Xmx8G param to this application.

See e.g. this article for eclipse: http://planetofbits.com/eclipse/increase-jvm-heap-size-in-eclipse/

JD

út 31. 7. 2018 v 16:02 odesílatel <jim.baird@...> napsal:
Hi 

I am running a Gate ML Classification Training PR as part of a new Java application built in Eclipse. 

I have successfully run the PR on a smaller training set, but the accuracy was very low and had a strong bias to one of the class types. 

I have increased the size of the training sets and have roughly equal numbers of each class in the training set (to remove bias).

I have increased the Gate.l4j.ini levels to -Xmx8G and -Xms1200m (my pc only has 8GB)

looking at the task manager, the total RAM usage during the process hovers just above 2GB. The size of all the documents is 54Mb, about 140 documents in total. My instances are each document and the features are the TYPE - token, FEATURE - string within each document.

I am getting the error below. 

I am wondering if I am simply exceeding the memory capacity because of the number of features / size of training set but the task manager suggests otherwise. 

It would be great to get any suggestions as to how I could avoid the limit or if there is another memory setting I need to modify.

Regards

Jim



An exception occurred during processing of documents, no training will be done
Exception was class java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
 at java.util.Arrays.copyOf(Arrays.java:3181)
 at java.util.ArrayList.grow(ArrayList.java:265)
 at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:239)
 at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:231)
 at java.util.ArrayList.add(ArrayList.java:462)
 at gate.util.SimpleSortedSet.add(SimpleSortedSet.java:88)
 at gate.jape.SinglePhaseTransducer.addAnnotationsByOffset(SinglePhaseTransducer.java:232)
 at gate.jape.SinglePhaseTransducer.transduce(SinglePhaseTransducer.java:262)
 at gate.jape.MultiPhaseTransducer.transduce(MultiPhaseTransducer.java:188)
 at gate.jape.Batch.transduce(Batch.java:203)
 at gate.creole.Transducer.execute(Transducer.java:177)
 at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
 at gate.creole.tokeniser.DefaultTokeniser.execute(DefaultTokeniser.java:186)
 at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
 at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:172)
 at gate.creole.SerialController.executeImpl(SerialController.java:157)
 at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:225)
 at gate.creole.ConditionalSerialAnalyserController.execute(ConditionalSerialAnalyserController.java:132)
 at octopus1_1.TrainClassifier.trainClassifier(TrainClassifier.java:55)
 at octopus1_1.Main.main(Main.java:38)


Jan Dedek
 

Hi Jim,

I think that Gate.l4j.ini
 file is only for the GATE GUI. If you are running your own new Java application built in Eclipse, you need to add the  -Xmx8G param to this application.

See e.g. this article for eclipse: http://planetofbits.com/eclipse/increase-jvm-heap-size-in-eclipse/

JD

út 31. 7. 2018 v 16:02 odesílatel <jim.baird@...> napsal:

Hi 

I am running a Gate ML Classification Training PR as part of a new Java application built in Eclipse. 

I have successfully run the PR on a smaller training set, but the accuracy was very low and had a strong bias to one of the class types. 

I have increased the size of the training sets and have roughly equal numbers of each class in the training set (to remove bias).

I have increased the Gate.l4j.ini levels to -Xmx8G and -Xms1200m (my pc only has 8GB)

looking at the task manager, the total RAM usage during the process hovers just above 2GB. The size of all the documents is 54Mb, about 140 documents in total. My instances are each document and the features are the TYPE - token, FEATURE - string within each document.

I am getting the error below. 

I am wondering if I am simply exceeding the memory capacity because of the number of features / size of training set but the task manager suggests otherwise. 

It would be great to get any suggestions as to how I could avoid the limit or if there is another memory setting I need to modify.

Regards

Jim



An exception occurred during processing of documents, no training will be done
Exception was class java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
 at java.util.Arrays.copyOf(Arrays.java:3181)
 at java.util.ArrayList.grow(ArrayList.java:265)
 at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:239)
 at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:231)
 at java.util.ArrayList.add(ArrayList.java:462)
 at gate.util.SimpleSortedSet.add(SimpleSortedSet.java:88)
 at gate.jape.SinglePhaseTransducer.addAnnotationsByOffset(SinglePhaseTransducer.java:232)
 at gate.jape.SinglePhaseTransducer.transduce(SinglePhaseTransducer.java:262)
 at gate.jape.MultiPhaseTransducer.transduce(MultiPhaseTransducer.java:188)
 at gate.jape.Batch.transduce(Batch.java:203)
 at gate.creole.Transducer.execute(Transducer.java:177)
 at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
 at gate.creole.tokeniser.DefaultTokeniser.execute(DefaultTokeniser.java:186)
 at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
 at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:172)
 at gate.creole.SerialController.executeImpl(SerialController.java:157)
 at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:225)
 at gate.creole.ConditionalSerialAnalyserController.execute(ConditionalSerialAnalyserController.java:132)
 at octopus1_1.TrainClassifier.trainClassifier(TrainClassifier.java:55)
 at octopus1_1.Main.main(Main.java:38)


jim.baird@...
 

Hi 

I am running a Gate ML Classification Training PR as part of a new Java application built in Eclipse. 

I have successfully run the PR on a smaller training set, but the accuracy was very low and had a strong bias to one of the class types. 

I have increased the size of the training sets and have roughly equal numbers of each class in the training set (to remove bias).

I have increased the Gate.l4j.ini levels to -Xmx8G and -Xms1200m (my pc only has 8GB)

looking at the task manager, the total RAM usage during the process hovers just above 2GB. The size of all the documents is 54Mb, about 140 documents in total. My instances are each document and the features are the TYPE - token, FEATURE - string within each document.

I am getting the error below. 

I am wondering if I am simply exceeding the memory capacity because of the number of features / size of training set but the task manager suggests otherwise. 

It would be great to get any suggestions as to how I could avoid the limit or if there is another memory setting I need to modify.

Regards

Jim



An exception occurred during processing of documents, no training will be done
Exception was class java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
 at java.util.Arrays.copyOf(Arrays.java:3181)
 at java.util.ArrayList.grow(ArrayList.java:265)
 at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:239)
 at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:231)
 at java.util.ArrayList.add(ArrayList.java:462)
 at gate.util.SimpleSortedSet.add(SimpleSortedSet.java:88)
 at gate.jape.SinglePhaseTransducer.addAnnotationsByOffset(SinglePhaseTransducer.java:232)
 at gate.jape.SinglePhaseTransducer.transduce(SinglePhaseTransducer.java:262)
 at gate.jape.MultiPhaseTransducer.transduce(MultiPhaseTransducer.java:188)
 at gate.jape.Batch.transduce(Batch.java:203)
 at gate.creole.Transducer.execute(Transducer.java:177)
 at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
 at gate.creole.tokeniser.DefaultTokeniser.execute(DefaultTokeniser.java:186)
 at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
 at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:172)
 at gate.creole.SerialController.executeImpl(SerialController.java:157)
 at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:225)
 at gate.creole.ConditionalSerialAnalyserController.execute(ConditionalSerialAnalyserController.java:132)
 at octopus1_1.TrainClassifier.trainClassifier(TrainClassifier.java:55)
 at octopus1_1.Main.main(Main.java:38)