Overview
Data integration changes are common, whether it is due to institutional business process changes, data definition changes, or product enhancements that support additional data for integration. This is the same process used during data integration in implementation.
Understanding the data integration tool for SS&E (Watermark Connect) is helpful when making data integration changes as they usually requires testing and iteration. This article explains how to isolate different steps in the data integration process in order to troubleshoot and test individual data feeds or steps in the extract-transform-load (ETL) process.
For more details on the data integration process steps, see the SIS Data Import Process and Infrastructure Summary.
1) Connect2/Accelerate
The Connect2/Accelerate tool runs automatically on each individual client site on the Connect server provided and owned by the client. The default schedule is configured to run at 2am, 8am, 2pm, 8pm. In most situations this process takes between 15 minutes and and hour, but the time can very depending on the amount of data you are exporting. The tool:
- Retrieves the source data (either a CSV or a Direct DB connection)
- Transforms the source data to JSON type files (eg. data extract files)
- SFTPs the JSON files to a Watermark AWS S3 bucket folder
-
Success: If the Connect job completes successfully, the JSON data will be visible in SS&E Administration and can be accessed from SIS Integration > Datafeeds > Download Extract.
For more information about how to download and view data extracts within SS&E, click here.
-
Failure: Issues with the Connect job (scheduled task) processing some or all feeds can often be identified in External Logging.
- Navigate to SS&E Administration > Advanced > External Logging
- Feed Failure: If you notice an individual feed with errors, generally it can be solved with changes to the source data or by updating the Accelerate query.
- Task Failure: If you notice that there are no log entries for a specific day/time period, begin by troubleshooting the scheduled task and the Connect server.
Run Accelerate Queries Locally
See this article for instructions on running Accelerate queries locally on the Connect server: Run Accelerate Queries Locally on the Connect Server
Run the Connect2 Job Manually
There are two different ways the Connect job can be run manually from the client Connect server for all data feeds:
- Using Task Scheduler
- Open Windows Task Scheduler and navigate to the Task Scheduler Library
- Select Aviso Connect2 (not the Auto-updater)
- Select Run/Enable
- Using PowerShell
- Open PowerShell as administrator
- Navigate to C:\Aviso\Connect2
- Run Connect2.ps1: .\Connect2.ps1
Limit the Connect job to Individual Feed(s) Indefinitely
To limit the scheduled Connect job to process an individual data feed(s) indefinitely, the application.properties file can be edited.
- Enable desired streams in application.properties, by adding:
streams.streamNames = nameOfStream1,nameOfStream2
Example that would only run the academic calendar, course and course section feeds:
stream.streamNames=academicCalendar,course,courseSection
Create a Secondary Connect Script to Test Individual Feed(s)
A secondary Connect script can also be created to limit streams that the Connect2 job processes.
This is helpful when testing Accelerate queries since it allows a secondary Connect2 job to run only on specific data streams instead of running the full process, and it doesn’t interfere with the regularly scheduled Connect2 job processing (as it’s easy to forget to re-enable all data streams in application.properties which can cause data feeds to go out of date).
To create a secondary Connect2 scheduled job:
- Create a copy of Connect2.ps1 named partConnect2.ps1
- Edit partConnect2.ps1 to refer to partial.properties instead of application.properties
- Create a copy of application.properties named partial.properties
- Update the log file to C:/Aviso/Connect2/logs/part-application.log
- Enable the desired streams for testing in partial.properties, by adding:
streams.streamNames = nameOfStream1,nameOfStream2
- Run partConnect2.ps1 in Task Scheduler or in PowerShell from C:\Aviso\Connect2: .\partConnect2.ps1
2) SS&E Import
The SS&E Import job process takes the JSON files from the AWS S3 data store and ingests them into the SS&E database.
-
Success: If the SIS Import job completes successfully, the SIS data is available in SS&E and visible both from the UI and from SS&E Administration > SIS Integration > Datafeeds > Imported Data View (which allows Administrator users to access the raw data stored in the SS&E database).
-
Import Process Failure: In some cases an unexpected issue can arise that causes the entire import process to fail. To see this type of error:
- Navigate to SS&E Administration > Advanced > Batch Processing > and switch to page 2
- Click the sisImportBatchJob link
- You will be able to view the run time and status of previous and currently running import jobs. A status of "FAILED" or "STOPPING" means the import process failed. You can dig deeper by looking in to the job log, but it's generally recommended to reach out to support for assistance if you notice consistent job failures.
-
Data Feed File Failure: If a data feed JSON file is invalid this can case the entire file to fail to parse. To see this type of error:
- Navigate to SS&E Administration > SIS Integration > Import Errors
- Search for "parse" in the "Search Message Text" field.
- This type of failure is most common if there are unexpected fields in the data feed JSON file. Review the SIS Import Definitions article and the troubleshooting article: SIS Import Error "Could not parse file:"
-
Record Import Failure: If certain records in the import data feed fail, in most cases, record import errors can be searched.
- Navigate to SS&E Administration > SIS Integration > Import Errors
- Import errors are often "expected" if the JSON files include records that reference missing data from other feeds. More information about data feeds associations can be found in the SIS Import Definitions article.
-
Caching Issue: If the data is correct in the JSON file, but incorrect in the Imported Data View and there are no Import Errors, there could be a caching issue. To resolve a caching issue:
- From SS&E Administration, navigate to SIS Integration > Manual API Calls
- Enter the data stream name(s) that you wish to reset in the Connector Cache box.
- Select Reset Cache.
Note: it is not recommended to reset all caches as the following import will take a very long time. - Review the data after the import runs again to confirm the issue is resolved.
Run the SIS Import Job Manually
The SIS Import job runs automatically four times daily. The default schedule is configured to run at 4:30am, 10:30am, 4:30pm, 10:30pm. To run the SIS Import job manually for all Data Feeds:
- From SS&E Administration, navigate to Advanced > Scheduled Jobs
- Scroll to sisImportBatchJob and select "Run"
Limit the SIS Import to an Individual Data Feed(s) Indefinitely
To turn the SIS Import Job on or off for individual data feeds permanently/indefinitely, any data feed can be edited in SS&E Administration using the following steps:
- From SS&E Administration, navigate to SIS Integration > Datafeeds
- Select the data feed name you wish to import or stop importing
- Check or uncheck the Enabled box as desired
- Save your changes
Limit the SIS Import to Individual Feed(s)
To limit the SIS Import job to run only on specific data feeds, those feeds can be selected and ALL other feeds will be ignored by the import job.
NOTE: There is no reporting or alerting that will monitor this and the restricted data import will remain in place, essentially preventing all other data feeds from importing data into SS&E, until the restricted streams are removed.
- From SS&E Administration, navigate to SIS Integration > Manual API Calls
- Enter the data stream name(s) that you wish to process in SIS Import process/job in the Restrict Streams box
- Click Restrict Streams to save your entry
- Run the SIS Import job manually or wait until it runs automatically
- Remember to remove all stream names from the Restricted Streams list in order to resume imports for all streams.
NOTE: It is VERY IMPORTANT to manage this and make sure this is reversed when testing is complete!