4. Dataset and Model Preparation

Cifer’s FHE framework supports encrypting both data and models for privacy-preserving computation.

Data files must be prepared in .npz format (NumPy compressed archive).
Models should be saved in .h5 format (Keras model format).

Before encryption, you must prepare your dataset or model appropriately.

4.1 Prepare Dataset in `.npz` Format

To use your dataset in the system, organize the data into a .npz file. The file should contain numerical arrays stored with keys train_images and train_labels, as shown in the example below:

python

import numpy as np

# X: feature data (e.g., images, tabular features), y: labels
np.savez("datasets/my_dataset.npz", train_images=X, train_labels=y)

Data Requirements:

X must be a NumPy array with an appropriate shape, e.g., (1000, 20) for 1000 samples with 20 features each.
y must be a 1D array (vector) of labels, e.g., (1000,).
The dataset must not contain missing values or malformed entries.

Additional Conversion Examples

Here are code examples demonstrating how to convert various raw data formats into .npz files. Below, you’ll find separate tabs for each file type with the corresponding conversion code snippets to help you prepare your dataset for encryption.

python

import numpy as np

data = np.loadtxt("data/my_data.csv", delimiter=",", skiprows=1)
X = data[:, :-1]  # features
y = data[:, -1]   # labels

np.savez("datasets/my_dataset.npz", train_images=X, train_labels=y)

pyth

from PIL import Image
import numpy as np
import os

image_dir = "data/images"
image_list = []

for filename in os.listdir(image_dir):
    if filename.endswith(".png"):
        img = Image.open(os.path.join(image_dir, filename)).convert("L")  # grayscale
        img_array = np.array(img)
        image_list.append(img_array)

X = np.stack(image_list)
y = np.array([...])  # corresponding labels

np.savez("datasets/my_dataset.npz", train_images=X, train_labels=y)

python

import librosa
import numpy as np

audio_path = 'data/audio/example.wav'
y, sr = librosa.load(audio_path, sr=None)  # Load audio file

# Extract MFCC features
mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)

# Transpose to shape (frames, features)
mfcc = mfcc.T

# Example: dummy labels for each audio file
labels = np.array([0])  # Replace with actual labels

# Save to npz
np.savez('datasets/audio_dataset.npz', train_images=mfcc, train_labels=labels)

python

import cv2
import numpy as np

video_path = 'data/video/example.mp4'
cap = cv2.VideoCapture(video_path)

frames = []
ret = True
while ret:
    ret, frame = cap.read()
    if ret:
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        frames.append(gray)

cap.release()

frames_array = np.stack(frames)  # Shape: (num_frames, height, width)

# Example: dummy labels
labels = np.array([0])  # Replace with actual labels

np.savez('datasets/video_dataset.npz', train_images=frames_array, train_labels=labels)

4.2 Prepare Model in `.h5` Format for Encryption

Cifer’s FHE framework supports encryption of models saved in Keras’s native .h5 format.

Save your trained model using:

python

model.save("trained_model/my_model.h5")

Example: Creating and Saving a Keras Model

python

from tensorflow import keras

model = keras.Sequential([
    keras.layers.Input(shape=(X.shape[1],)),
    keras.layers.Dense(64, activation="relu"),
    keras.layers.Dense(1, activation="sigmoid")
])

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model.fit(X, y, epochs=5)

# Save the model
model.save("trained_model/my_model.h5")

Additional Notes on Model Formats

If you have models in other formats (e.g., TensorFlow SavedModel, PyTorch), convert them to .h5 format for compatibility:

You can load the SavedModel and save as .h5 as explained in TensorFlow official guide.

Example snippet:

python

model = tf.keras.models.load_model('saved_model_dir')
model.save('model.h5')

Export to ONNX, then convert to TensorFlow/Keras.
For PyTorch to ONNX conversion, see PyTorch ONNX export docs.
After ONNX export, use converters such as onnx-tf to convert to TensorFlow, then save as .h5

Now your dataset and model files are properly prepared for FHE encryption. Proceed to the next step to perform encryption and integrate them into your federated learning workflow.

Previous3. Install FHE Next5. Encryption

Last updated 2 months ago

4.1 Prepare Dataset in .npz Format

Data Requirements:

Additional Conversion Examples

4.2 Prepare Model in .h5 Format for Encryption

Example: Creating and Saving a Keras Model

Additional Notes on Model Formats

4.1 Prepare Dataset in `.npz` Format

4.2 Prepare Model in `.h5` Format for Encryption