Updating Google Workspaces Fast and Easy with Powerful Scripts, Python, and GAM

Updating Google Workspaces Fast and Easy with Powerful Scripts, Python, and GAM

Are you in charge of managing Google Workspaces Administration for you organization, but aren’t able to use an Active Directory or Syncing service? Ever wondered how you might be able to automate the process of managing accounts? Do you already use Google Application Manager (GAM), but want to significantly reduce the amount of time it takes to process regular tasks? This article goes into enough depth to help you get started.

Connect with me on Linkden or Upwork to get support in building a system for your site like what’s described in this article. I am available to hire to help you develop a system and learn how to manage it and to can be retained to help fix things when something breaks. This includes any site-specific customizations. Let me help you get it off the ground! You can find me via these links on Linkedin and Upwork.

What you need

  • A working knowledge of Python
  • A google account to access Google Collaboratory, Drive, Sheets, etc.
  • Some experience or knowledge, or ability to research and learn about Google Cloud Admin
  • Some experience working in Command Prompt or Terminal and a basic knowledge of how to write batch scripts to run in Command Prompt.
  • GAMADV-XTD3 which can be installed using the installation instructions here

This article primarily covers what you can do using all of these tools combined and provides a snapshot of the process. If you want to learn more or get direct support from me, reach out to me via Linkedin or Upwork.

Google Collaboratory

Google Collaboratory is a fantastic resource for coding, especially in Python. It supports over 40 languages including Python and R, uses minimal resources on your computer, easily connects with Google Drive – which can host your data sources, and even has a pro feature that gives you more power and processing for higher level tasks. Everything outlined here can be done for free with Google Collaboratory.

For more experienced coders, Collaboratory is quite similar to Jupyter Notebook, but doesn\’t require a local install or purchasing external hosting in order to run – which is what makes it my preferred option. If you require something with more power and/or have concerns about security, you can spend boatloads more to purchase a high-end GPU, but for most everyday projects Collaboratory is a great tool. Check out the subscription tiers and features here.

GAMADV-XTD3

The Google Applications Manager application has been [insert GAM history]

GAMADV-XTD3 is a brrance / fork / further developed version of the original GAM that can simply just handle more commands / connections to the Google API. It can do more than stock GAM and this power is what helps make things a lot smoother. There are a lot of resources out there to reasearch and learn more about command line coding using GAM, such as the wiki for the project or various other cheatsheets / resources. Google searching will find a decent amount of references. Use the installation guide on the wiki for GAMADV-XTD3 for the most accurate information on setting up the tool.

Other stuff

You will need to be able to access to Google Cloud admin, however, that is mostly explained in the GAM setup. Once your computer is has GAM setup correctly, you don’t really interact much with the platform.

Writing out a series of update commands to run once the databases have been updated / exported is what really drives this process. You’ll need a working idea of how to manage writing batch scripts – which are basically just a series of commands that can be run in terminal. Combined with GAM, this makes most of the processing unattended.

Using Python and Google Collab to update Datasets in Google

There are essentially two parts to this system, with three separate steps to execute the entire system. You’ll need to get the data you need from your database or information system first. I update Google Workspaces for a school. Our district uses Powerschool as the student information system, but won’t allow us to connect a syncing tool to Google. Aside form that, there aren’t many if any tools that can be customized to the point that I’ve been able to develop our update tool.

Second, you need to run the python script in Collab. This script / notebook needs to be built custom to your needs. In my case, I needed a script that would prepare data to be parsed by the GAM commands using batch files in Command Prompt. All i needed Collab to do is take the raw data from the Powerschool export and transform it into tables that GAM can use to update various things based on data in a CSV file.

Third, you’ll need to run the custom batch file to execute all the commands. My batch file takes the exports from the Collab script / notebook, pushes commands to Google via the API to updated Workspaces, then outputs reports that are the backbone of many of our reporting systems. My tool updates users as students enroll and unenroll form the school, updates various google groups based on grade level, handles enrollments in google classroom for all of our scheduled courses in Powerschool, and processes a few other tasks we’ve custom built to fit out needs.

If you want help doing this at your site, reach out to me via Linkedin or Upwork, I’d be glad to help point you in the right direction or come on board to support your work directly.

Data Import

I\’ve covered a bit of this in Pandas Library – The Powerful Data Custodian – Best Libraries in Python, however you basically need to know the format of your data source, specifically its separator in order to import the data into Collab using pd.read_csv. Another alternative is if you already have the dataset somewhere in Google Drive, IE in a Google sheet, you can import it directly using gspread. Side Note: You can also connect with SQL databases and various other sources, but I am not covering that here. I\’ll be covering SQL and connecting with more cloud-based databases / datasources in the future.

Data Manipulation

The most important step in making sure your updates will go smoothly is to understand how you need your data structured. GAM has a function that allows you to pass to it the location and name of a CSV file, a GAM command, and references to the correct headers in the CSV file to process multiple commands at once. It is far far faster for managing large updates.

In my case, I take our expansive dataset from PowerSchool, check the entire set for new users – comparing to the last user update, parse all the data for each class enrollment (which requires converting some in-line data into separate rows for each course a student takes), and handling phone numbers and addresses that get set off into datasets used to update some of our communication tools.

Workspaces Update

Google Collab does all the work to transform this data using some basic data cleaning and preparation skills I gained in my Practicum by Yandex course. The data is prepared, stored in CSV files in Google Drive, which then syncs to my local hard drive via Drive for Desktop. I have a batch file written and stored in the same folder in drive. once the data is setup for GAM to process, I am then able to run the batch file and update everything we need to in Google Workspaces. Lets get into showing you a bit of how this works.

Peeking inside the process

Data Export

Possibly the simplest process, I export the data we want from our central database, in our case Powerschool. This could also come from a different database system, a SQL database online, a Google Sheet or Excel / Access workbook or database. It doesn’t quite matter what the source is as long as you know the structure of the file. My export gives me a file that like this:

This file is formatted as a tab-separated values (tsv) file with the file extension of .text. The great thing about read_csv is that the file extension doesn’t matter. It reads the file directly based on the parameter we define in the sep=”” field. I also import files from google.colab library so I can directly upload the file into the notebook. This avoids having to upload and link it to the notebook manually, just makes things simpler. There’s a few steps in the notebook before this that allows me to connect to Drive and update sheets with specific instructions later on. This is what the code looks like:

from google.colab import files
uploaded = files.upload()
filename = str(next(iter(uploaded)))

print(uploaded)
print(filename)

Student_Data_PS = pd.read_csv(io.BytesIO(uploaded[filename]), sep='\t', dtype='str') # error_bad_lines=False,
print(Student_Data_PS.dtypes)
print(Student_Data_PS.head())
print()
os.listdir()

This code allows me to select the file to upload and stores it as a DataFrame in Student_Data_PS. I then use some code to re-name columns to a more readable format. Some of the queries required to export this data set the query / table name as the header, so I have to set these with labels that are more descriptive / specific. The script then stores that data in a google sheet using gspread. Here’s what that code looks like, with some information removed for security reasons):

# ACCESS GOOGLE SHEET
gc = gspread.service_account(filename='/content/drive/MyDrive/*Python*/XXXXXXXXXXX.json')
sh = gc.open_by_key('XXXXXXXXXXXXXXXXXXXXXXXXXXX')
worksheet = sh.get_worksheet(0) #-> 0 - first sheet, 1 - second sheet etc.

# CLEAR SHEET CONTENT
# range_of_cells = worksheet.range('A2:FC1000') #-> Select the range you want to clear
sh.values_clear("PSData!A2:FC1000")

# APPEND DATA TO SHEET
# your_dataframe = pd.DataFrame()
set_with_dataframe(worksheet, Student_Data_PS) #-> THIS EXPORTS YOUR DATAFRAME TO THE GOOGLE SHEET

# worksheet.update([dataframe.columns.values.tolist()] + dataframe.values.tolist())

This stores the data from Student_Data_PS into a specific sheet that numerous other reports sit on top of. This method works well for us as most of those reprots are small-scale, with less than 10000 records, so using sheets has very little slow down.

Updating datasets

The next task is to update a variety of sheets in Google Drive that are used to identify new users, class enrollment changes, teacher records, and more by reference the reports generated at the end of the last update. I’ll explain this more in the section below. It is also a good practice to print or display these tables along the way to make it easier to visually confirm that things are owrking as they should.

Exporting datasets for use with GAM Batch Script

The third step is taking the manipulated / prepared data and generating the list of new users, classes, enrollments, etc in order for GAM to do its work updating Google Workspaces. I use a boolean condition to identify users who are in the new data who weren’t in the most recent user list exported by the GAM batch script. We then take those records and export them to a CSV that the GAM batch script will use to update our user database. A similar thing is done to update classes and enrollments. Here’s a sample of the code I use to create this CSV based on comparing the data:

psdata['in_google'] = psdata.Student_Number.isin(guser.customSchemas_StudentData_id) | psdata.Student_Number.isin(guser.externalIds_0_value)

new_users = psdata.loc[psdata['in_google'] == False, ['Student_Number', 'Last_Name', 'First_Name', 'LastFirst', 'U_StudentsUserFields.sps_emailalias', 'DOB', 'Grade_Level', 'S_MA_STU_DEMOGRAPHIC_X.EthnicityRaceCode', 'Gender', 'SPED_Program','U_StudentsUserFields.lep_level', 'S_MA_STU_DEMOGRAPHIC_X.LEPProgramStatus']]
new_users['DOB_month'] = pd.DatetimeIndex(new_users['DOB']).month.map("{:02}".format).astype(str)
new_users['DOB_year'] = pd.DatetimeIndex(new_users['DOB']).year.astype(str).str[2:4]
new_users['DOB_day'] = pd.DatetimeIndex(new_users['DOB']).day.map("{:02}".format).astype(str)
new_users['password'] = "Sp"+new_users['DOB_month']+new_users['DOB_day']+new_users['DOB_year']
# new_users['id_length'] = new_users['Studen'].str.len()
new_users['HSC_Email'] = new_users['First_Name']+new_users['Last_Name'].str[0:1]+new_users['Student_Number'].astype(str).str[-2:].astype(str)+"@hscommerce.org"
new_users['SPS_email'] = new_users['Student_Number']+"@springfieldpublicschools.com"
new_users['SPED_Program'] = new_users['SPED_Program'].replace(r'^\s*$', "Not Sped", regex=True)
new_users['U_StudentsUserFields.lep_level'] = new_users['U_StudentsUserFields.lep_level'].replace(r'^\s*$', "No_Level", regex=True)
# print(new_users.columns)
new_users.columns = ['ID', 'Last', 'First', 'LastFirst', 'Alias', 'DOB', 'Grade', 'Demographics', 'Gender', 'SPED', 'FLEP_Level', 'FLEP_Status', 'month', 'year', 'day', 'password', 'HSC_email', 'SPS_email']
display(new_users)

# os.listdir('drive/MyDrive/*Python*')

new_users.to_csv('/content/drive/MyDrive/*Python*/new_users.csv', index=False, encoding='utf-8-sig')

pd.to_csv is used export the new_users DataFrame into a specific folder in Drive that I have set to always have on my computer. When the script updates it, the local file also updates, making sure everything is ready to be handled by the GAM batch script.

GAM Batch Script

The script is finished running once it has finished preparing and exporting data to the various CSV files. Here is a snapshot of the folder that hosts these files:

BatchChanges.bat is then run. It updates users, classes, and enrollments based on the data in the various csv files. Here’s a peek into the batch file code and what it looks like when it is run:

Once the batch script finishes its run, everything is updated, new reports on users, courses, and course participants are generated and pushed into specific sheets in Drive, and we are good to go until the next update. This process takes less than 45 minutes in total, maximum, most of which is waiting for the data export to generate in PowerSchool.

Want to build something similar for your institution or school? Reach out to me via Linkedin or Upwork.

Don’t hesitate to explore using Python for similar tasks in your workplace. There are a whole variety of ever-growing tools in Python that make life easier. In the meantime, check out the links below to plug in and learn more Python today.

Sign up for DataCamp today – https://www.datacamp.com/promo/build-the-future-2022 – Get 53% off for your first year

Practicum by Yandex – https://practicum.yandex.com/

Pathstream – https://www.pathstream.com