Export Gedcom File - Place Tidy


dave@...
 

Mike

There is a routine in the plugin that identifies similar places and tidies them as part of the export.

Would it be possible to extract that routine to identify such places, but not correct them?

I feel that that would be a useful plugin in its own right.

Many thanks

Dave
FH v7


Mike Tate
 

Dave,

I’m not sure what you mean by “identifies similar places”.

Are you thinking of the Conflicting Place Record Names warning with ‘Result Set’ and ‘Acknowledge’ options?

If you choose the ‘Result Set’ option then it does list them.

 

The plugin removes blank comma-separated Place parts and ensures every comma is followed by a space.

That is essentially the same Place name tidying that FH performs in Diagrams and Reports.

If the Plugin exported GEDCOM includes Place records, and the tidied Place names of two records are the same, then it raises the warning.

 

So is it only a list of such conflicting Place record names that you are proposing would be useful?

Why not use the Result Set produced by the Export Gedcom File plugin?

I suspect a review of Tools > Work with Data > Places or the Records Window Places tab would soon spot such conflicts.

 

As a challenge, why not write the Plugin yourself?

 

Mike

 

From: family-historian@groups.io <family-historian@groups.io> On Behalf Of dave@...
Sent: 04 March 2021 23:41
To: family-historian@groups.io
Subject: [family-historian] Export Gedcom File - Place Tidy

 

Mike

There is a routine in the plugin that identifies similar places and tidies them as part of the export.

Would it be possible to extract that routine to identify such places, but not correct them?

I feel that that would be a useful plugin in its own right.

Many thanks

Dave
FH v7


dave@...
 

Mike,

What I am trying to achieve is to identify where tidying a place name would bring it into conflict with an existing record.
e.g. Salford,  Lancashire [Id 403] would tidy to Salford, Lancashire [Id 167]

The owner of the file often makes little errors like this and I am looking for a way to find them easily.

The gedcom currently has in excess of 13,000 place entries (unfortunately they use Place as a combination of Address and Place).

However what you have said has given me some pointers and so I'll have a go.

Many thanks

Dave


Andrew Braid
 

Mike

Have you tried clicking the Reverse Display Order box in the bottom right hand corner of the Place List window? It might help you the easily merge some of the entries.

Andrew

On 05/03/2021 23:02, dave@... wrote:
Mike,

What I am trying to achieve is to identify where tidying a place name would bring it into conflict with an existing record.
e.g. Salford,  Lancashire [Id 403] would tidy to Salford, Lancashire [Id 167]

The owner of the file often makes little errors like this and I am looking for a way to find them easily.

The gedcom currently has in excess of 13,000 place entries (unfortunately they use Place as a combination of Address and Place).

However what you have said has given me some pointers and so I'll have a go.

Many thanks

Dave


Paul Sillitoe
 

I've been following these data standards queries with great interest since joining the list a few months ago. They are the sort of data management issues that I have been working with as an archivist over the last three decades. My approach has distilled down to a couple of over-riding principles.

1. Where a national or international data standard already exists, go with it. If it doesn't exactly meet your particular needs, work with others to change the standard, not make individual local changes on the fly. To do so reduces general interoperability and creates data migration problems, such have been noted on this list.
Of course, nothing mandates anyone to do anything other than what they wish to do; its just that an individualised approach can result in the sorts of issues lately raised.

2. Manage data to the lowest level of granularity. Again, this facilitates easier data management and migration. For example, in this context, splitting down place names into individual units of administration, with qualifiers. I'm not familiar with the standard that cites "Salford,  Lancashire [Id 403] would tidy to Salford, Lancashire [Id 167]", but have, for example, recently had to deal with instances of "Salford, Lancashire" and "Salford, Manchester" in the same set of records. Generally, I qualify a local administrative unit place name with the dates of existence of that unit.

So much that can be said on this fascinating topic. Its not that it gets in the way of the actual research, but rather, minimises individual issues at the data entry stage, which might otherwise aggregate into macro problems across a large dataset.

Best regards 

Paul Sillitoe


Sent from my Samsung Galaxy smartphone but not so smart as to usefully auto-correct the typos from my large fingers 🙂


-------- Original message --------
From: Andrew Braid <Jamesandrewbraid@...>
Date: 06/03/2021 09:14 (GMT+00:00)
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

Mike

Have you tried clicking the Reverse Display Order box in the bottom right hand corner of the Place List window? It might help you the easily merge some of the entries.

Andrew

On 05/03/2021 23:02, dave@... wrote:
Mike,

What I am trying to achieve is to identify where tidying a place name would bring it into conflict with an existing record.
e.g. Salford,  Lancashire [Id 403] would tidy to Salford, Lancashire [Id 167]

The owner of the file often makes little errors like this and I am looking for a way to find them easily.

The gedcom currently has in excess of 13,000 place entries (unfortunately they use Place as a combination of Address and Place).

However what you have said has given me some pointers and so I'll have a go.

Many thanks

Dave


Mike Tate
 

Dave, If you are interested in writing a Plugin to report such conflicts then I an happy to give some tips & guidance.

I suggest it is called ‘Find Duplicate Place Names’ to tie in with all the other ‘Find Duplicate…’ Plugins.

 

Mike Tate

 


Mike Tate
 

Paul,

Thank you for that observation which I generally agree with, but I suspect you have missed the point.

It is nothing to do with administrative place name standards.

The difference between

"Salford,  Lancashire” held in Place Record Id [403]

and

“Salford, Lancashire” held in Place Record Id [167]

is that the first has two spaces and the second only one space, so when ‘tidied’ the first is identical to the second.

But two separate Place records are not allowed to have the same name and therefore must be merged.

It is such Place Records that Dave needs to identify.

 

Mike Tate

 


Victor Markham
 

What I dio is work open tools menu on the top. Prompt 'work with data' then places.

Type the place name and if 2 or more similar names appear highlight them then chose 'merge' picking the correct one. All the places highlighted will then be merged with one single correct place name

Victor

On 06/03/2021 11:12 am, Mike Tate wrote:

Paul,

Thank you for that observation which I generally agree with, but I suspect you have missed the point.

It is nothing to do with administrative place name standards.

The difference between

"Salford,  Lancashire” held in Place Record Id [403]

and

“Salford, Lancashire” held in Place Record Id [167]

is that the first has two spaces and the second only one space, so when ‘tidied’ the first is identical to the second.

But two separate Place records are not allowed to have the same name and therefore must be merged.

It is such Place Records that Dave needs to identify.

 

Mike Tate

 


Paul Sillitoe
 

Thanks Mike, but I think that it was a comment of more general relevance rather than being aimed at one specific point.

Best

Paul



Sent from my Samsung Galaxy smartphone but not so smart as to usefully auto-correct the typos from my large fingers 🙂


-------- Original message --------
From: Mike Tate <post@...>
Date: 06/03/2021 11:13 (GMT+00:00)
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

Paul,

Thank you for that observation which I generally agree with, but I suspect you have missed the point.

It is nothing to do with administrative place name standards.

The difference between

"Salford,  Lancashire” held in Place Record Id [403]

and

“Salford, Lancashire” held in Place Record Id [167]

is that the first has two spaces and the second only one space, so when ‘tidied’ the first is identical to the second.

But two separate Place records are not allowed to have the same name and therefore must be merged.

It is such Place Records that Dave needs to identify.

 

Mike Tate

 


Mike Tate
 

Sorry Paul, but in that case, you should have started a new topic, because this thread is very focussed on just tidying commas & spaces in Place names.

 

From: family-historian@groups.io <family-historian@groups.io> On Behalf Of Paul Sillitoe
Sent: 06 March 2021 11:26
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

Thanks Mike, but I think that it was a comment of more general relevance rather than being aimed at one specific point.

 

Best

 

Paul

 


dave@...
 

Mike,

Thanks for that.
I've got a couple of ideas for the methodology floating around my head at the moment.
I will ponder them over the weekend, and have a look on Tuesday to see how far I can get.
Thanks for the offer of guidance.

Dave


Paul Sillitoe
 

If messages to this list are so topic specific, why are they posted as individual nessages rather than as threads? Just asking...



Sent from my Samsung Galaxy smartphone but not so smart as to usefully auto-correct the typos from my large fingers 🙂


-------- Original message --------
From: Mike Tate <post@...>
Date: 06/03/2021 12:21 (GMT+00:00)
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

Sorry Paul, but in that case, you should have started a new topic, because this thread is very focussed on just tidying commas & spaces in Place names.

 

From: family-historian@groups.io <family-historian@groups.io> On Behalf Of Paul Sillitoe
Sent: 06 March 2021 11:26
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

Thanks Mike, but I think that it was a comment of more general relevance rather than being aimed at one specific point.

 

Best

 

Paul

 


Mike Tate
 

It would be unworkable to keep the entire thread history in every message posted and the Subject line identifies the topic.

Although, if you don’t delete the earlier postings that is what will happen, but we are advised against that by the Admins.

Follow the link below to ‘View/Reply Online’ with a unique thread number and click ‘View All Messages In Topic’.

Then you will see the complete thread just like in most forums.

 

 

From: family-historian@groups.io <family-historian@groups.io> On Behalf Of Paul Sillitoe
Sent: 06 March 2021 12:36
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

If messages to this list are so topic specific, why are they posted as individual nessages rather than as threads? Just asking...


colevalleygirl@colevalleygirl.co.uk
 

Paul,

 

I’m not sure Mike has any authority to dictate how this list works, but the responses are actually threaded as you’ll see if you go to https://groups.io/g/family-historian/topics.

 

From: family-historian@groups.io <family-historian@groups.io> On Behalf Of Paul Sillitoe
Sent: 06 March 2021 12:36
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

If messages to this list are so topic specific, why are they posted as individual nessages rather than as threads? Just asking...

 

 

 

Sent from my Samsung Galaxy smartphone but not so smart as to usefully auto-correct the typos from my large fingers 🙂

 

 

-------- Original message --------

From: Mike Tate <post@...>

Date: 06/03/2021 12:21 (GMT+00:00)

Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

Sorry Paul, but in that case, you should have started a new topic, because this thread is very focussed on just tidying commas & spaces in Place names.

 

From: family-historian@groups.io <family-historian@groups.io> On Behalf Of Paul Sillitoe
Sent: 06 March 2021 11:26
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

Thanks Mike, but I think that it was a comment of more general relevance rather than being aimed at one specific point.

 

Best

 

Paul

 


Mike Tate
 

There I go speaking out of turn and acting like a dictator again!

 

From: family-historian@groups.io <family-historian@groups.io> On Behalf Of colevalleygirl@...
Sent: 06 March 2021 12:52
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

Paul,

 

I’m not sure Mike has any authority to dictate how this list works, but the responses are actually threaded as you’ll see if you go to https://groups.io/g/family-historian/topics.

 

From: family-historian@groups.io <family-historian@groups.io> On Behalf Of Paul Sillitoe
Sent: 06 March 2021 12:36
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

If messages to this list are so topic specific, why are they posted as individual nessages rather than as threads? Just asking...

 

 

 

Sent from my Samsung Galaxy smartphone but not so smart as to usefully auto-correct the typos from my large fingers 🙂

 

 

-------- Original message --------

From: Mike Tate <post@...>

Date: 06/03/2021 12:21 (GMT+00:00)

Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

Sorry Paul, but in that case, you should have started a new topic, because this thread is very focussed on just tidying commas & spaces in Place names.

 

From: family-historian@groups.io <family-historian@groups.io> On Behalf Of Paul Sillitoe
Sent: 06 March 2021 11:26
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

Thanks Mike, but I think that it was a comment of more general relevance rather than being aimed at one specific point.

 

Best

 

Paul

 


Paul Sillitoe
 


No problem and none taken 😁😁

All best

Paul


Sent from my Samsung Galaxy smartphone but not so smart as to usefully auto-correct the typos from my large fingers 🙂


-------- Original message --------
From: Mike Tate <post@...>
Date: 06/03/2021 13:01 (GMT+00:00)
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

There I go speaking out of turn and acting like a dictator again!

 

From: family-historian@groups.io <family-historian@groups.io> On Behalf Of colevalleygirl@...
Sent: 06 March 2021 12:52
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

Paul,

 

I’m not sure Mike has any authority to dictate how this list works, but the responses are actually threaded as you’ll see if you go to https://groups.io/g/family-historian/topics.

 

From: family-historian@groups.io <family-historian@groups.io> On Behalf Of Paul Sillitoe
Sent: 06 March 2021 12:36
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

If messages to this list are so topic specific, why are they posted as individual nessages rather than as threads? Just asking...

 

 

 

Sent from my Samsung Galaxy smartphone but not so smart as to usefully auto-correct the typos from my large fingers 🙂

 

 

-------- Original message --------

From: Mike Tate <post@...>

Date: 06/03/2021 12:21 (GMT+00:00)

Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

Sorry Paul, but in that case, you should have started a new topic, because this thread is very focussed on just tidying commas & spaces in Place names.

 

From: family-historian@groups.io <family-historian@groups.io> On Behalf Of Paul Sillitoe
Sent: 06 March 2021 11:26
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

Thanks Mike, but I think that it was a comment of more general relevance rather than being aimed at one specific point.

 

Best

 

Paul

 


Mike Tate
 

Personally, I think this would be easier to manage in the FHUG ‘Plugin Discussions’ Forum.

There are certainly some shortcuts to searching the Place list, getting the Record Id, and tidying & tracking duplicate names.

I would advise against the Plugin doing any actual record merging as there may be conflicting data in the two copies.

Mike

 

From: family-historian@groups.io <family-historian@groups.io> On Behalf Of dave@...
Sent: 06 March 2021 16:27
To: family-historian@groups.io
Subject: Re: [family-historian] Export Gedcom File - Place Tidy

 

Right, I feel as though I'm stood on top of a 30 metre diving board with only 1 metre of water below me!

Considering that this topic is moving towards being about a plugin, does it need moving to Plugins?


Find Duplicate Place Names plugin

 These are the thoughts buzzing round my head at the moment

 

Aim

Family Historian doesn’t allow duplicate Place Names.

However, if we try and tidy mis-formed Place Names, this may result in duplicates

To identify Place Names, which after tidying would result in a duplicate address.

e.g. 

"Salford,  Lancashire” held in Place Record Id [403] – Has 2 adjacent spaces before Lancashire 

“Salford, Lancashire” held in Place Record Id [167] – No problems with this entry

 

However, if we tidied the first entry then it would create a duplicate.

 

Step 1 – Define the criteria to identify a candidate for tidying

Note: I have used the diamond symbol ◊ to represent a space for clarity

 

My initial criteria would be:

Multiple adjacent spaces       e.g.      ‘Bradford,◊◊Yorkshire, England’

Multiple adjacent commas*   e.g.      ‘Bradford,, Yorkshire, England’

Leading space                       e.g.      ‘Bradford, Yorkshire, England’

Leading comma*                   e.g.      ‘,Bradford, Yorkshire, England’

Trailing space                        e.g.      ‘Bradford, Yorkshire, England

Trailing comma                      e.g.      ‘Bradford, Yorkshire, England,’

Space preceding a comma    e.g.      ‘Bradford, Yorkshire, England’

 

*It is possible that a user may intentionally use extra commas to force positioning

 

Step 2 – Create a list of all Places placelist

Do I need to step through the GedCom or can I access FH’s internal list?

My initial thought was that fhGetDataList("PLACES")would work but whilst this returns a list of Places, it doesn’t return the record Id.

Do I need the record ID? It would be useful.

 

Step 3 – Identify any Place that matches the criteria shown in Step 1

Add offending Places to a list

Search through placelist and look for any offending places
Could use the Lua equivalent of Grep?

Copy the record into a list invalidlist

Remove from placelist any entries described as invalid
This means that at the end, placelist will contain only valid Places

 

 

Step 4  - Tidy the Places that need tidying

This needs to be done based on the invalid list created in Step 3

Step through invalidlist and correct the errors
Could use the Lua equivalent of Grep?

 

At this point we have 2 lists

placelists – contains valid Places

invalidlist – contains invalid Places that have now been corrected

 

Step 5 – Compare the tidied list to with existing Places to see if duplicate

Compare all entries in invalidlist  with placelist

If there are any matches, then these need to be identified

Need to return the Place name and Id from invalidlist and the matching values in placelist

 

It would also be useful to return those Places identified as invalid but that don’t clash with valid Places. This would allow the user the opportunity to correct them.

Dave


dave@...
 

I have now moved my post about the plugin to the plugin forum.

Dave