Reserved de-dupe rulesAdding value conditions to dedupe rulesCreate new contacts every time someone makes a contribution ( with duplicate emails )Deduping organizationsWhat are the Attributes for the Reserved DeDupe RulesContact Dedupe BasicsCivi duplicate rules with respect to email & primary versus location typeEmployer on Profile - MismatchingDedupe Rules to import contacts without email addresses *UPDATE*Import Contacts with dedupe rulesUnsupervised reserved rule for individuals not working
Pronouncing Dictionary.com's W.O.D "vade mecum" in English
What would happen to a modern skyscraper if it rains micro blackholes?
Why don't electron-positron collisions release infinite energy?
How can the DM most effectively choose 1 out of an odd number of players to be targeted by an attack or effect?
Can a German sentence have two subjects?
Infinite past with a beginning?
Why don't electromagnetic waves interact with each other?
Set-theoretical foundations of Mathematics with only bounded quantifiers
How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?
Why is an old chain unsafe?
Shell script can be run only with sh command
Copycat chess is back
What do you call a Matrix-like slowdown and camera movement effect?
Why are 150k or 200k jobs considered good when there are 300k+ births a month?
TGV timetables / schedules?
Theorems that impeded progress
Can an x86 CPU running in real mode be considered to be basically an 8086 CPU?
How do we improve the relationship with a client software team that performs poorly and is becoming less collaborative?
Email Account under attack (really) - anything I can do?
What do you call something that goes against the spirit of the law, but is legal when interpreting the law to the letter?
Japan - Plan around max visa duration
Are tax years 2016 & 2017 back taxes deductible for tax year 2018?
How is it possible to have an ability score that is less than 3?
How did the USSR manage to innovate in an environment characterized by government censorship and high bureaucracy?
Reserved de-dupe rules
Adding value conditions to dedupe rulesCreate new contacts every time someone makes a contribution ( with duplicate emails )Deduping organizationsWhat are the Attributes for the Reserved DeDupe RulesContact Dedupe BasicsCivi duplicate rules with respect to email & primary versus location typeEmployer on Profile - MismatchingDedupe Rules to import contacts without email addresses *UPDATE*Import Contacts with dedupe rulesUnsupervised reserved rule for individuals not working
I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.
duplicate-contacts
add a comment |
I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.
duplicate-contacts
add a comment |
I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.
duplicate-contacts
I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.
duplicate-contacts
duplicate-contacts
asked Apr 4 at 17:09
Mick KahnMick Kahn
621316
621316
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!
If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.
"Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.
Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.
UPDATE: More clarification.
To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.
The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:
SELECT contact1.id as id1, contact2.id as id2, $rg->threshold as weight
FROM civicrm_contact as contact1
JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
JOIN civicrm_contact as contact2 ON
contact1.first_name = contact2.first_name AND
contact1.last_name = contact2.last_name
JOIN civicrm_email as email2 ON
email2.contact_id=contact2.id AND
email1.email=email2.email
WHERE contact1.contact_type = 'Individual'"
First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.
To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.
That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:

However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.
Thanks. Good lord...
– Demerit
2 days ago
Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.
– Mick Kahn
2 days ago
I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.
– Mick Kahn
2 days ago
@MickKahn I just updated my answer, hopefully it makes things clearer!
– Jon G - Megaphone Tech
2 days ago
Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.
– Mick Kahn
2 days ago
add a comment |
EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.
If you have access to the database type
SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;
which will give you a table which isn't pretty but is mostly understandable.
Ah you're right. I'll update answer.
– Demerit
Apr 4 at 18:19
Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer
– Mick Kahn
Apr 4 at 20:22
The 'green tick' belongs on Jon's answer since the question is about reserved rules.
– Aidan♦
2 days ago
I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.
– Mick Kahn
2 days ago
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "605"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcivicrm.stackexchange.com%2fquestions%2f29155%2freserved-de-dupe-rules%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!
If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.
"Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.
Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.
UPDATE: More clarification.
To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.
The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:
SELECT contact1.id as id1, contact2.id as id2, $rg->threshold as weight
FROM civicrm_contact as contact1
JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
JOIN civicrm_contact as contact2 ON
contact1.first_name = contact2.first_name AND
contact1.last_name = contact2.last_name
JOIN civicrm_email as email2 ON
email2.contact_id=contact2.id AND
email1.email=email2.email
WHERE contact1.contact_type = 'Individual'"
First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.
To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.
That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:

However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.
Thanks. Good lord...
– Demerit
2 days ago
Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.
– Mick Kahn
2 days ago
I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.
– Mick Kahn
2 days ago
@MickKahn I just updated my answer, hopefully it makes things clearer!
– Jon G - Megaphone Tech
2 days ago
Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.
– Mick Kahn
2 days ago
add a comment |
Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!
If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.
"Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.
Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.
UPDATE: More clarification.
To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.
The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:
SELECT contact1.id as id1, contact2.id as id2, $rg->threshold as weight
FROM civicrm_contact as contact1
JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
JOIN civicrm_contact as contact2 ON
contact1.first_name = contact2.first_name AND
contact1.last_name = contact2.last_name
JOIN civicrm_email as email2 ON
email2.contact_id=contact2.id AND
email1.email=email2.email
WHERE contact1.contact_type = 'Individual'"
First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.
To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.
That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:

However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.
Thanks. Good lord...
– Demerit
2 days ago
Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.
– Mick Kahn
2 days ago
I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.
– Mick Kahn
2 days ago
@MickKahn I just updated my answer, hopefully it makes things clearer!
– Jon G - Megaphone Tech
2 days ago
Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.
– Mick Kahn
2 days ago
add a comment |
Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!
If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.
"Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.
Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.
UPDATE: More clarification.
To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.
The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:
SELECT contact1.id as id1, contact2.id as id2, $rg->threshold as weight
FROM civicrm_contact as contact1
JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
JOIN civicrm_contact as contact2 ON
contact1.first_name = contact2.first_name AND
contact1.last_name = contact2.last_name
JOIN civicrm_email as email2 ON
email2.contact_id=contact2.id AND
email1.email=email2.email
WHERE contact1.contact_type = 'Individual'"
First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.
To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.
That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:

However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.
Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!
If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.
"Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.
Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.
UPDATE: More clarification.
To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.
The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:
SELECT contact1.id as id1, contact2.id as id2, $rg->threshold as weight
FROM civicrm_contact as contact1
JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
JOIN civicrm_contact as contact2 ON
contact1.first_name = contact2.first_name AND
contact1.last_name = contact2.last_name
JOIN civicrm_email as email2 ON
email2.contact_id=contact2.id AND
email1.email=email2.email
WHERE contact1.contact_type = 'Individual'"
First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.
To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.
That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:

However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.
edited 2 days ago
answered 2 days ago
Jon G - Megaphone TechJon G - Megaphone Tech
27.5k11872
27.5k11872
Thanks. Good lord...
– Demerit
2 days ago
Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.
– Mick Kahn
2 days ago
I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.
– Mick Kahn
2 days ago
@MickKahn I just updated my answer, hopefully it makes things clearer!
– Jon G - Megaphone Tech
2 days ago
Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.
– Mick Kahn
2 days ago
add a comment |
Thanks. Good lord...
– Demerit
2 days ago
Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.
– Mick Kahn
2 days ago
I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.
– Mick Kahn
2 days ago
@MickKahn I just updated my answer, hopefully it makes things clearer!
– Jon G - Megaphone Tech
2 days ago
Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.
– Mick Kahn
2 days ago
Thanks. Good lord...
– Demerit
2 days ago
Thanks. Good lord...
– Demerit
2 days ago
Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.
– Mick Kahn
2 days ago
Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.
– Mick Kahn
2 days ago
I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.
– Mick Kahn
2 days ago
I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.
– Mick Kahn
2 days ago
@MickKahn I just updated my answer, hopefully it makes things clearer!
– Jon G - Megaphone Tech
2 days ago
@MickKahn I just updated my answer, hopefully it makes things clearer!
– Jon G - Megaphone Tech
2 days ago
Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.
– Mick Kahn
2 days ago
Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.
– Mick Kahn
2 days ago
add a comment |
EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.
If you have access to the database type
SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;
which will give you a table which isn't pretty but is mostly understandable.
Ah you're right. I'll update answer.
– Demerit
Apr 4 at 18:19
Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer
– Mick Kahn
Apr 4 at 20:22
The 'green tick' belongs on Jon's answer since the question is about reserved rules.
– Aidan♦
2 days ago
I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.
– Mick Kahn
2 days ago
add a comment |
EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.
If you have access to the database type
SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;
which will give you a table which isn't pretty but is mostly understandable.
Ah you're right. I'll update answer.
– Demerit
Apr 4 at 18:19
Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer
– Mick Kahn
Apr 4 at 20:22
The 'green tick' belongs on Jon's answer since the question is about reserved rules.
– Aidan♦
2 days ago
I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.
– Mick Kahn
2 days ago
add a comment |
EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.
If you have access to the database type
SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;
which will give you a table which isn't pretty but is mostly understandable.
EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.
If you have access to the database type
SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;
which will give you a table which isn't pretty but is mostly understandable.
edited 2 days ago
answered Apr 4 at 17:35
DemeritDemerit
4,0612621
4,0612621
Ah you're right. I'll update answer.
– Demerit
Apr 4 at 18:19
Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer
– Mick Kahn
Apr 4 at 20:22
The 'green tick' belongs on Jon's answer since the question is about reserved rules.
– Aidan♦
2 days ago
I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.
– Mick Kahn
2 days ago
add a comment |
Ah you're right. I'll update answer.
– Demerit
Apr 4 at 18:19
Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer
– Mick Kahn
Apr 4 at 20:22
The 'green tick' belongs on Jon's answer since the question is about reserved rules.
– Aidan♦
2 days ago
I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.
– Mick Kahn
2 days ago
Ah you're right. I'll update answer.
– Demerit
Apr 4 at 18:19
Ah you're right. I'll update answer.
– Demerit
Apr 4 at 18:19
Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer
– Mick Kahn
Apr 4 at 20:22
Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer
– Mick Kahn
Apr 4 at 20:22
The 'green tick' belongs on Jon's answer since the question is about reserved rules.
– Aidan♦
2 days ago
The 'green tick' belongs on Jon's answer since the question is about reserved rules.
– Aidan♦
2 days ago
I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.
– Mick Kahn
2 days ago
I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.
– Mick Kahn
2 days ago
add a comment |
Thanks for contributing an answer to CiviCRM Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcivicrm.stackexchange.com%2fquestions%2f29155%2freserved-de-dupe-rules%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown