Azure Synapse - Target environment

This article proposes a possible target environment on Azure Synapse for Microsoft Fabric DataVault and Microsoft Fabric DataVault and Mart generators.

Installation and configuration of the target environment are not part of biGENIUS support.

Unfortunately, we won't be able to provide any help beyond this example in this article.

Many other configurations and installations are possible for a Spark target environment.

Below is a possible target environment setup on Azure Synapse for a Microsoft Fabric generator.

The Property Target Platform should be set to the Azure Synapse value:

Setup environment

The Azure Synapse target environment needs at least the following Azure resources set in your Azure Subscription:

  • A Ressource Group: some help is available here
  • Inside the Ressource Group:
    • 2 Storage Accounts: some help is available here
      • 1 for the Source data
      • 1 for the Target Data Lake
    • An Apache Spark Pool: some help is available here
    • A Synapse Workspace: some help is available here

In the following example of the target environment, we will use the following resources:

  • A Ressource Group named bg-synapse
  • A Storage Account for our Source Data named bgsynapselandingzone1
  • A Storage Account for our Target Data Lake named bgsynapsedatalake1
  • An Apache Spark Pool named bgaasspark33v2
  • A Synapse Workspace named bgsynapseworkspace1

Tools to install

Please install the following tools:

  • Azure Storage Explorer: available here

Target Storage Account

We have chosen to create a folder named claire-datalake in our Target Storage Account:

  • Open Azure Storage Explorer
  • Connect to your Subscription
  • Open the Target Storage Account
  • Create a folder 

For this example, we have 1 Target folder for our Data Lake:

Source Data

There are three ways to provide source data to a Microsoft Fabric Data Vault or Data Vault and Mart generators:

  • From Parquet files by using the Microsoft Fabric Stage Files generator as a Linked Project
  • From any database accessed through JDBC by using the Microsoft Fabric Stage JDBC generator as a Linked Project
  • From existing Delta files by using a direct Discovery to a Spark External Table that contains the Delta file data

Parquet Files

If your source data are stored in Parquet files, please:

  • Create a first Project with the Microsoft Fabric Stage Files generator
  • In this first Project, discover the Parquet files, create the Stage Model Object, generate, deploy, and load data in a Lake House.
  • Create a second Project with the Microsoft Fabric Data Vault or DataVault and Mart generators.
  • In this second Project, use the first Project Stage Model Object as a source by using the Linked Project feature.

Database

If your source data are stored in a database such as Microsoft SQL Server or Postgres (or any database you can access through JDBC), please:

  • Create a first Project with the Microsoft Fabric Stage JDBC generator
  • In this first Project, discover the database tables, create the Stage Model Object, generate, deploy, and load data in a Lake House.
  • Create a second Project with the Microsoft Fabric Data Vault or Microsoft Fabric DataVault and Mart generator
  • In this second Project, use the first Project Stage Model Object as a source by using the Linked Project feature.

Delta Files

You may have existing Delta Files in your Lake House that you want to use as source data.

Please manually create a Spark External Table in your Lake House containing the Delta File data.

For example, we have the following Delta File containing Credit Card data in our Lake House named bglakehouse1:

We will create a Spark External table named creditcard_delta with the following code:

--Replace bgsynapselandingzone1 by your data lake name

df = spark.sql(f"""
    CREATE EXTERNAL TABLE stage.creditcard_delta
    USING DELTA
    LOCATION 'abfss://source@bgsynapselandingzone1.dfs.core.windows.net/delta/creditcard/'
""")

Then, create a Discovery in the Microsoft Fabric DataVault or DataVault and Mart Project using a JSON discovery file generated by the Discovery Companion Application with a Delta file.

Upload Artifacts in Azure Synapse

Pleaset now upload the generated Artifacts from the biGENIUS-X application to the Azure Synapse Workspace.

Please replace the placeholders before uploading the artifacts.

  • Click on the Synapse Workspace in Azure:
  • Then click on the Workspace web URL:
  • Azure Synapse Analytics is opened:
  • Click on the Develop menu on the left-hand-side:
  • If you are in the live mode of Synapse, change it to the Azure DevOps Git mode:
  • Create a new branch named demo_synapse:

  • We have chosen to create a folder named Artifacts:

In the file 500_Deploy_and_Load_DataVault_Synapse.ipynb, adapt the name of the XXX_Deployment, the  XXX_SimpleLoadexecution.ipynb, and the XXX_SelectResults.ipynb by the name of your Helper files.

  • Commit all in the demo_synapse branch:
  • Make a pull request from your branch demo_synapse to the collaboration branch:
  • Open the collaboration branch and click on the Publish button:

You're now ready to deploy these artifacts and subsequently load the data based on the Generator you are using.