Joining Salesforce Object Data

Salesforce objects cannot be joined using SOQL like tables in traditional databases, unfortunately. More context and detail here and here.

SOQL really only supports selecting data from multiple objects if there is an explicit lookup or master-detail relationship between them.

from simple_salesforce import Salesforce
import pandas as pd

Credentials below are for a trailhead org created for purposes of this note, only. It contains only default data provided by Salesforce.

sf = Salesforce(username='[email protected]',
                password='password_201019',
                security_token='ozSA9fb0lwwyqslXtUDxRDQu')

Examples Joins

These examples are taken from this link.

There are two approaches demonstrated below. The first involves merging the tables using Pandas. The seconds attempts a JOIN using SOQL, which has no native JOIN keyword.


Example JOINs Using Pandas

Each object is queried into its own Pandas DataFrame and then joined using Panda’s merge function.

Child to Parent Query

Answers the question, “What is the name of our Accounts and which user owns them?”

query = '''
SELECT Id, Name, OwnerId
  FROM Account
'''
records = sf.query_all(query)
account = (pd.DataFrame(records['records'])
          .drop(columns=['attributes']))
account.head(3)

Id Name OwnerId
0 0014W000026oyWmQAI Express Logistics and Transport 0054W00000CKU8EQAX
1 0014W000026oyWnQAI University of Arizona 0054W00000CKU8EQAX
2 0014W000026oyWoQAI United Oil & Gas, UK 0054W00000CKU8EQAX
query = '''
SELECT Id, Name
  FROM User
'''
records = sf.query_all(query)
user = (pd.DataFrame(records['records'])
        .drop(columns=['attributes']))
user.columns = 'Owner' + user.columns
user.head(3)

OwnerId OwnerName
0 0054W00000CElpoQAD Automated Process
1 0054W00000CElpnQAD Integration User
2 0054W00000CElpvQAD Security User
result = pd.merge(account, user,
                  on='OwnerId',
                  how='left',
                  indicator=True)
print(result._merge.value_counts())
result.head(3)
both          12
right_only     0
left_only      0
Name: _merge, dtype: int64

Id Name OwnerId OwnerName _merge
0 0014W000026oyWmQAI Express Logistics and Transport 0054W00000CKU8EQAX Ryan Wingate both
1 0014W000026oyWnQAI University of Arizona 0054W00000CKU8EQAX Ryan Wingate both
2 0014W000026oyWoQAI United Oil & Gas, UK 0054W00000CKU8EQAX Ryan Wingate both

Parent to Child Query

Answers the question, “Who are the contacts for each Account?”

query = '''
SELECT Id, Name
  FROM Account
'''
records = sf.query_all(query)
account = (pd.DataFrame(records['records'])
          .drop(columns=['attributes']))
account.columns = 'Account' + account.columns
account.head(3)

AccountId AccountName
0 0014W000026oyWmQAI Express Logistics and Transport
1 0014W000026oyWnQAI University of Arizona
2 0014W000026oyWoQAI United Oil & Gas, UK
query = '''
SELECT Id, Name, AccountId
  FROM Contact
'''
records = sf.query_all(query)
contact = (pd.DataFrame(records['records'])
        .drop(columns=['attributes']))
contact.head(3)

Id Name AccountId
0 0034W00002FamUcQAJ Rose Gonzalez 0014W000026oyWgQAI
1 0034W00002FamUdQAJ Sean Forbes 0014W000026oyWgQAI
2 0034W00002FamUeQAJ Jack Rogers 0014W000026oyWhQAI
result = pd.merge(account, contact,
                  on='AccountId',
                  how='left',
                  indicator=True)
print(result._merge.value_counts())
result.head(3)
both          20
right_only     0
left_only      0
Name: _merge, dtype: int64

AccountId AccountName Id Name _merge
0 0014W000026oyWmQAI Express Logistics and Transport 0034W00002FamUlQAJ Babara Levy both
1 0014W000026oyWmQAI Express Logistics and Transport 0034W00002FamUmQAJ Josh Davis both
2 0014W000026oyWnQAI University of Arizona 0034W00002FamUnQAJ Jane Grey both

Query with WHERE and Combinational Logic

query = '''
SELECT Id, Name
  FROM Account
 WHERE Name = 'Express Logistics and Transport'
    OR Name = 'University of Arizona'
'''
records = sf.query_all(query)
account = (pd.DataFrame(records['records'])
          .drop(columns=['attributes']))
account.columns = 'Account' + account.columns
account.head(3)

AccountId AccountName
0 0014W000026oyWmQAI Express Logistics and Transport
1 0014W000026oyWnQAI University of Arizona
result = pd.merge(account, contact,
                  on='AccountId',
                  how='left',
                  indicator=True)
print(result._merge.value_counts())
result.head(3)
both          3
right_only    0
left_only     0
Name: _merge, dtype: int64

AccountId AccountName Id Name _merge
0 0014W000026oyWmQAI Express Logistics and Transport 0034W00002FamUlQAJ Babara Levy both
1 0014W000026oyWmQAI Express Logistics and Transport 0034W00002FamUmQAJ Josh Davis both
2 0014W000026oyWnQAI University of Arizona 0034W00002FamUnQAJ Jane Grey both

Example JOINs Using SOQL

Child to Parent Query

Answers the question, “What is the name of our Accounts and which user owns them?”

One-to-One relationship: for each Account, there will be at most one Owner with one Name.

  • You can only go one level down in relationships when querying from parent to child.
  • You can go five levels up the relationship chain when querying from a child to a parent.
query = '''
SELECT Name, Owner.Name
  FROM Account
'''
records = sf.query_all(query)
result = (pd.DataFrame(records['records'])
          .drop(columns=['attributes']))
result.head(3)

Name Owner
0 Express Logistics and Transport {'attributes': {'type': 'User', 'url': '/servi...
1 University of Arizona {'attributes': {'type': 'User', 'url': '/servi...
2 United Oil & Gas, UK {'attributes': {'type': 'User', 'url': '/servi...

Parsing the data with Pandas isn’t as clean as it is when querying a single object. The query results in a nested dictionary that would require additional work to parse the Owner Name out. See below.

records['records'][0]
OrderedDict([('attributes',
              OrderedDict([('type', 'Account'),
                           ('url',
                            '/services/data/v42.0/sobjects/Account/0014W000026oyWmQAI')])),
             ('Name', 'Express Logistics and Transport'),
             ('Owner',
              OrderedDict([('attributes',
                            OrderedDict([('type', 'User'),
                                         ('url',
                                          '/services/data/v42.0/sobjects/User/0054W00000CKU8EQAX')])),
                           ('Name', 'Ryan Wingate')]))])

Parent to Child Query

Answers the question, “Who are the contacts for each Account?”

One-to-Many relationship: for each Account, there can be multiple Contacts.

query = '''
SELECT Account.Name,
       (SELECT Contact.Name FROM contacts)
  FROM Account
'''
records = sf.query_all(query)
result = (pd.DataFrame(records['records'])
          .drop(columns=['attributes']))
result.head(3)

Name Contacts
0 Express Logistics and Transport {'totalSize': 2, 'done': True, 'records': [{'a...
1 University of Arizona {'totalSize': 1, 'done': True, 'records': [{'a...
2 United Oil & Gas, UK {'totalSize': 1, 'done': True, 'records': [{'a...