Joining Salesforce Object Data
Salesforce objects cannot be joined using SOQL like tables in traditional databases, unfortunately. More context and detail here and here.
SOQL really only supports selecting data from multiple objects if there is an explicit lookup or master-detail relationship between them.
from simple_salesforce import Salesforce
import pandas as pd
Credentials below are for a trailhead org created for purposes of this note, only. It contains only default data provided by Salesforce.
sf = Salesforce(username='[email protected]',
password='password_201019',
security_token='ozSA9fb0lwwyqslXtUDxRDQu')
Examples Joins
These examples are taken from this link.
There are two approaches demonstrated below. The first involves merging the tables using Pandas. The seconds attempts a JOIN using SOQL, which has no native JOIN keyword.
Example JOINs Using Pandas
Each object is queried into its own Pandas DataFrame and then joined using Panda’s merge
function.
Child to Parent Query
Answers the question, “What is the name of our Accounts and which user owns them?”
query = '''
SELECT Id, Name, OwnerId
FROM Account
'''
records = sf.query_all(query)
account = (pd.DataFrame(records['records'])
.drop(columns=['attributes']))
account.head(3)
Id | Name | OwnerId | |
---|---|---|---|
0 | 0014W000026oyWmQAI | Express Logistics and Transport | 0054W00000CKU8EQAX |
1 | 0014W000026oyWnQAI | University of Arizona | 0054W00000CKU8EQAX |
2 | 0014W000026oyWoQAI | United Oil & Gas, UK | 0054W00000CKU8EQAX |
query = '''
SELECT Id, Name
FROM User
'''
records = sf.query_all(query)
user = (pd.DataFrame(records['records'])
.drop(columns=['attributes']))
user.columns = 'Owner' + user.columns
user.head(3)
OwnerId | OwnerName | |
---|---|---|
0 | 0054W00000CElpoQAD | Automated Process |
1 | 0054W00000CElpnQAD | Integration User |
2 | 0054W00000CElpvQAD | Security User |
result = pd.merge(account, user,
on='OwnerId',
how='left',
indicator=True)
print(result._merge.value_counts())
result.head(3)
both 12
right_only 0
left_only 0
Name: _merge, dtype: int64
Id | Name | OwnerId | OwnerName | _merge | |
---|---|---|---|---|---|
0 | 0014W000026oyWmQAI | Express Logistics and Transport | 0054W00000CKU8EQAX | Ryan Wingate | both |
1 | 0014W000026oyWnQAI | University of Arizona | 0054W00000CKU8EQAX | Ryan Wingate | both |
2 | 0014W000026oyWoQAI | United Oil & Gas, UK | 0054W00000CKU8EQAX | Ryan Wingate | both |
Parent to Child Query
Answers the question, “Who are the contacts for each Account?”
query = '''
SELECT Id, Name
FROM Account
'''
records = sf.query_all(query)
account = (pd.DataFrame(records['records'])
.drop(columns=['attributes']))
account.columns = 'Account' + account.columns
account.head(3)
AccountId | AccountName | |
---|---|---|
0 | 0014W000026oyWmQAI | Express Logistics and Transport |
1 | 0014W000026oyWnQAI | University of Arizona |
2 | 0014W000026oyWoQAI | United Oil & Gas, UK |
query = '''
SELECT Id, Name, AccountId
FROM Contact
'''
records = sf.query_all(query)
contact = (pd.DataFrame(records['records'])
.drop(columns=['attributes']))
contact.head(3)
Id | Name | AccountId | |
---|---|---|---|
0 | 0034W00002FamUcQAJ | Rose Gonzalez | 0014W000026oyWgQAI |
1 | 0034W00002FamUdQAJ | Sean Forbes | 0014W000026oyWgQAI |
2 | 0034W00002FamUeQAJ | Jack Rogers | 0014W000026oyWhQAI |
result = pd.merge(account, contact,
on='AccountId',
how='left',
indicator=True)
print(result._merge.value_counts())
result.head(3)
both 20
right_only 0
left_only 0
Name: _merge, dtype: int64
AccountId | AccountName | Id | Name | _merge | |
---|---|---|---|---|---|
0 | 0014W000026oyWmQAI | Express Logistics and Transport | 0034W00002FamUlQAJ | Babara Levy | both |
1 | 0014W000026oyWmQAI | Express Logistics and Transport | 0034W00002FamUmQAJ | Josh Davis | both |
2 | 0014W000026oyWnQAI | University of Arizona | 0034W00002FamUnQAJ | Jane Grey | both |
Query with WHERE and Combinational Logic
query = '''
SELECT Id, Name
FROM Account
WHERE Name = 'Express Logistics and Transport'
OR Name = 'University of Arizona'
'''
records = sf.query_all(query)
account = (pd.DataFrame(records['records'])
.drop(columns=['attributes']))
account.columns = 'Account' + account.columns
account.head(3)
AccountId | AccountName | |
---|---|---|
0 | 0014W000026oyWmQAI | Express Logistics and Transport |
1 | 0014W000026oyWnQAI | University of Arizona |
result = pd.merge(account, contact,
on='AccountId',
how='left',
indicator=True)
print(result._merge.value_counts())
result.head(3)
both 3
right_only 0
left_only 0
Name: _merge, dtype: int64
AccountId | AccountName | Id | Name | _merge | |
---|---|---|---|---|---|
0 | 0014W000026oyWmQAI | Express Logistics and Transport | 0034W00002FamUlQAJ | Babara Levy | both |
1 | 0014W000026oyWmQAI | Express Logistics and Transport | 0034W00002FamUmQAJ | Josh Davis | both |
2 | 0014W000026oyWnQAI | University of Arizona | 0034W00002FamUnQAJ | Jane Grey | both |
Example JOINs Using SOQL
Child to Parent Query
Answers the question, “What is the name of our Accounts and which user owns them?”
One-to-One relationship: for each Account, there will be at most one Owner with one Name.
- You can only go one level down in relationships when querying from parent to child.
- You can go five levels up the relationship chain when querying from a child to a parent.
query = '''
SELECT Name, Owner.Name
FROM Account
'''
records = sf.query_all(query)
result = (pd.DataFrame(records['records'])
.drop(columns=['attributes']))
result.head(3)
Name | Owner | |
---|---|---|
0 | Express Logistics and Transport | {'attributes': {'type': 'User', 'url': '/servi... |
1 | University of Arizona | {'attributes': {'type': 'User', 'url': '/servi... |
2 | United Oil & Gas, UK | {'attributes': {'type': 'User', 'url': '/servi... |
Parsing the data with Pandas isn’t as clean as it is when querying a single object. The query results in a nested dictionary that would require additional work to parse the Owner Name out. See below.
records['records'][0]
OrderedDict([('attributes',
OrderedDict([('type', 'Account'),
('url',
'/services/data/v42.0/sobjects/Account/0014W000026oyWmQAI')])),
('Name', 'Express Logistics and Transport'),
('Owner',
OrderedDict([('attributes',
OrderedDict([('type', 'User'),
('url',
'/services/data/v42.0/sobjects/User/0054W00000CKU8EQAX')])),
('Name', 'Ryan Wingate')]))])
Parent to Child Query
Answers the question, “Who are the contacts for each Account?”
One-to-Many relationship: for each Account, there can be multiple Contacts.
query = '''
SELECT Account.Name,
(SELECT Contact.Name FROM contacts)
FROM Account
'''
records = sf.query_all(query)
result = (pd.DataFrame(records['records'])
.drop(columns=['attributes']))
result.head(3)
Name | Contacts | |
---|---|---|
0 | Express Logistics and Transport | {'totalSize': 2, 'done': True, 'records': [{'a... |
1 | University of Arizona | {'totalSize': 1, 'done': True, 'records': [{'a... |
2 | United Oil & Gas, UK | {'totalSize': 1, 'done': True, 'records': [{'a... |