Overview
Data integration changes are common, whether it is due to institutional business process changes, data definition changes, or product enhancements that support additional data for integration. This is the same process used during data integration in implementation.
Understanding the data integration tool for SS&E (Watermark Connect) is helpful when making data integration changes as they usually requires testing and iteration. This article explains how to isolate different steps in the data integration process in order to troubleshoot and test individual data feeds or steps in the extract-transform-load (ETL) process.
For more details on the data integration process steps, see the SIS Data Import Process and Infrastructure Summary.
1) Connect2/Accelerate
The Connect2/Accelerate tool runs automatically on each individual client site on the Connect server provided and owned by the client. The default schedule is configured to run at 2am, 8am, 2pm, 8pm. In most situations this process takes between 15 minutes and and hour, but the time can very depending on the amount of data you are exporting. The tool:
- Retrieves the source data (either a CSV or a Direct DB connection)
- Transforms the source data to JSON type files (eg. data extract files)
- SFTPs the JSON files to a Watermark AWS S3 bucket folder
Success: If the Connect job completes successfully, the JSON data will be visible in SS&E Administration and can be accessed from SIS Integration > Datafeeds > Download Extract.
Failure: Issues with the Connect job (scheduled task) processing some or all feeds can often be identified in External Logging. Individual Feeds can fail or the Task can fail.
See SIS Integration Troubleshooting: Connect2/Accelerate for basic troubleshooting instructions.
Run Accelerate Queries Locally
See this article for instructions on running Accelerate queries locally on the Connect server: Run Accelerate Queries Locally on the Connect Server
Run the Connect2 Job Manually
There are two different ways the Connect job can be run manually from the client Connect server for all data feeds:
- Using Task Scheduler
- Open Windows Task Scheduler and navigate to the Task Scheduler Library
- Select Aviso Connect2 (not the Auto-updater)
- Select Run/Enable
- Using PowerShell
- Open PowerShell as administrator
- Navigate to C:\Aviso\Connect2
- Run Connect2.ps1: .\Connect2.ps1
Limit the Connect job to Individual Feed(s) Indefinitely
To limit the scheduled Connect job to process an individual data feed(s) indefinitely, the application.properties file can be edited.
- Enable desired streams in application.properties, by adding:
streams.streamNames = nameOfStream1,nameOfStream2
Example that would only run the academic calendar, course and course section feeds:
stream.streamNames=academicCalendar,course,courseSection
Create a Secondary Connect Script to Test Individual Feed(s)
A secondary Connect script can also be created to limit streams that the Connect2 job processes.
This is helpful when testing Accelerate queries since it allows a secondary Connect2 job to run only on specific data streams instead of running the full process, and it doesn’t interfere with the regularly scheduled Connect2 job processing (as it’s easy to forget to re-enable all data streams in application.properties which can cause data feeds to go out of date).
To create a secondary Connect2 scheduled job:
- Create a copy of Connect2.ps1 named partConnect2.ps1
- Edit partConnect2.ps1 to refer to partial.properties instead of application.properties
- Create a copy of application.properties named partial.properties
- Update the log file to C:/Aviso/Connect2/logs/part-application.log
- Enable the desired streams for testing in partial.properties, by adding:
streams.streamNames = nameOfStream1,nameOfStream2
- Run partConnect2.ps1 in Task Scheduler or in PowerShell from C:\Aviso\Connect2: .\partConnect2.ps1
Make Changes to Accelerate
See Accelerate Configuration for more information on the Accelerate Configuration.
2) SS&E Import
The SS&E Import job process takes the JSON files from the AWS S3 data store and ingests them into the SS&E database.
Success: If the SIS Import job completes successfully, the SIS data is available in SS&E and visible from SS&E Administration > SIS Integration > Datafeeds > Imported Data View.
Failure: Issues with the Import include Import Process failure, Data Feed File failure, Record Import failure, and Caching issues.
See SIS Integration Troubleshooting: Import for basic troubleshooting instructions.
Reset Import Cache
If the data is correct in the JSON file, but incorrect in the Imported Data View and there are no Import Errors, there could be a caching issue. To resolve a caching issue:
- From SS&E Administration, navigate to SIS Integration > Manual API Calls
- Enter the data stream name(s) that you wish to reset in the Connector Cache box.
- Select Reset Cache.
Note: it is not recommended to reset all caches as the following import will take a very long time. - Review the data after the import runs again to confirm the issue is resolved.
Run the SIS Import Job Manually
The SIS Import job runs automatically four times daily. The default schedule is configured to run at 4:30am, 10:30am, 4:30pm, 10:30pm. To run the SIS Import job manually for all Data Feeds:
- From SS&E Administration, navigate to Advanced > Scheduled Jobs
- Scroll to sisImportBatchJob and select "Run"
Limit the SIS Import to an Individual Data Feed(s) Indefinitely
To turn the SIS Import Job on or off for individual data feeds permanently/indefinitely, any data feed can be edited in SS&E Administration using the following steps:
- From SS&E Administration, navigate to SIS Integration > Datafeeds
- Select the data feed name you wish to import or stop importing
- Check or uncheck the Enabled box as desired
- Save your changes
Limit the SIS Import to Individual Feed(s)
To limit the SIS Import job to run only on specific data feeds, those feeds can be selected and ALL other feeds will be ignored by the import job.
NOTE: There is no reporting or alerting that will monitor this and the restricted data import will remain in place, essentially preventing all other data feeds from importing data into SS&E, until the restricted streams are removed.
- From SS&E Administration, navigate to SIS Integration > Manual API Calls
- Enter the data stream name(s) that you wish to process in SIS Import process/job in the Restrict Streams box
- Click Restrict Streams to save your entry
- Run the SIS Import job manually or wait until it runs automatically
- Remember to remove all stream names from the Restricted Streams list in order to resume imports for all streams.
NOTE: It is VERY IMPORTANT to manage this and make sure this is reversed when testing is complete!
Additional Information
- Quick Guide: View JSON Extract and Imported Data
- Quick Guide: SIS Integration Troubleshooting
- SIS Data Import Process
- Run Accelerate Queries Locally on the Connect Server
- Accelerate Configuration