4. Dataset and Model Preparation
Cifer’s FHE framework supports encrypting both data and models for privacy-preserving computation.
Data files must be prepared in
.npzformat (NumPy compressed archive).Models should be saved in
.h5format (Keras model format).
Before encryption, you must prepare your dataset or model appropriately.
4.1 Prepare Dataset in .npz Format
.npz FormatTo use your dataset in the system, organize the data into a .npz file. The file should contain numerical arrays stored with keys train_images and train_labels, as shown in the example below:
import numpy as np
# X: feature data (e.g., images, tabular features), y: labels
np.savez("datasets/my_dataset.npz", train_images=X, train_labels=y)Data Requirements:
Xmust be a NumPy array with an appropriate shape, e.g.,(1000, 20)for 1000 samples with 20 features each.ymust be a 1D array (vector) of labels, e.g.,(1000,).The dataset must not contain missing values or malformed entries.
Additional Conversion Examples
Here are code examples demonstrating how to convert various raw data formats into .npz files.
Below, you’ll find separate tabs for each file type with the corresponding conversion code snippets to help you prepare your dataset for encryption.
import numpy as np
data = np.loadtxt("data/my_data.csv", delimiter=",", skiprows=1)
X = data[:, :-1] # features
y = data[:, -1] # labels
np.savez("datasets/my_dataset.npz", train_images=X, train_labels=y)from PIL import Image
import numpy as np
import os
image_dir = "data/images"
image_list = []
for filename in os.listdir(image_dir):
if filename.endswith(".png"):
img = Image.open(os.path.join(image_dir, filename)).convert("L") # grayscale
img_array = np.array(img)
image_list.append(img_array)
X = np.stack(image_list)
y = np.array([...]) # corresponding labels
np.savez("datasets/my_dataset.npz", train_images=X, train_labels=y)import librosa
import numpy as np
audio_path = 'data/audio/example.wav'
y, sr = librosa.load(audio_path, sr=None) # Load audio file
# Extract MFCC features
mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
# Transpose to shape (frames, features)
mfcc = mfcc.T
# Example: dummy labels for each audio file
labels = np.array([0]) # Replace with actual labels
# Save to npz
np.savez('datasets/audio_dataset.npz', train_images=mfcc, train_labels=labels)import cv2
import numpy as np
video_path = 'data/video/example.mp4'
cap = cv2.VideoCapture(video_path)
frames = []
ret = True
while ret:
ret, frame = cap.read()
if ret:
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
frames.append(gray)
cap.release()
frames_array = np.stack(frames) # Shape: (num_frames, height, width)
# Example: dummy labels
labels = np.array([0]) # Replace with actual labels
np.savez('datasets/video_dataset.npz', train_images=frames_array, train_labels=labels)4.2 Prepare Model in .h5 Format for Encryption
.h5 Format for EncryptionCifer’s FHE framework supports encryption of models saved in Keras’s native .h5 format.
Save your trained model using:
model.save("trained_model/my_model.h5")Example: Creating and Saving a Keras Model
from tensorflow import keras
model = keras.Sequential([
keras.layers.Input(shape=(X.shape[1],)),
keras.layers.Dense(64, activation="relu"),
keras.layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model.fit(X, y, epochs=5)
# Save the model
model.save("trained_model/my_model.h5")Additional Notes on Model Formats
If you have models in other formats (e.g., TensorFlow SavedModel, PyTorch), convert them to .h5 format for compatibility:
You can load the SavedModel and save as .h5 as explained in TensorFlow official guide.
Example snippet:
model = tf.keras.models.load_model('saved_model_dir')
model.save('model.h5')Export to ONNX, then convert to TensorFlow/Keras.
For PyTorch to ONNX conversion, see PyTorch ONNX export docs.
After ONNX export, use converters such as
onnx-tfto convert to TensorFlow, then save as.h5
Now your dataset and model files are properly prepared for FHE encryption. Proceed to the next step to perform encryption and integrate them into your federated learning workflow.
Last updated