¿Hay alguna manera de ajustar automáticamente el ancho de las columnas de Excel con pandas.ExcelWriter?

Question 1

Se me pide que genere algunos informes de Excel. Actualmente estoy usando pandas bastante para mis datos, así que, naturalmente, me gustaría usar el método pandas.ExcelWriter para generar estos informes. Sin embargo, los anchos de columna fijos son un problema.

El código que tengo hasta ahora es bastante simple. Digamos que tengo un marco de datos llamado 'df':

writer = pd.ExcelWriter(excel_file_path, engine='openpyxl')
df.to_excel(writer, sheet_name="Summary")

Estaba mirando el código de pandas y realmente no veo ninguna opción para establecer el ancho de las columnas. ¿Existe algún truco en el universo para que las columnas se ajusten automáticamente a los datos? ¿O hay algo que pueda hacer después del hecho en el archivo xlsx para ajustar el ancho de las columnas?

(Estoy usando la biblioteca OpenPyXL y generando archivos .xlsx, si eso hace alguna diferencia).

Gracias.

Question 2

Inspirado por la respuesta del usuario6178746 , tengo lo siguiente:

# Given a dict of dataframes, for example:
# dfs = {'gadgets': df_gadgets, 'widgets': df_widgets}

writer = pd.ExcelWriter(filename, engine='xlsxwriter')
for sheetname, df in dfs.items():  # loop through `dict` of dataframes
    df.to_excel(writer, sheet_name=sheetname)  # send df to writer
    worksheet = writer.sheets[sheetname]  # pull worksheet object
    for idx, col in enumerate(df):  # loop through all columns
        series = df[col]
        max_len = max((
            series.astype(str).map(len).max(),  # len of largest item
            len(str(series.name))  # len of column name/header
            )) + 1  # adding a little extra space
        worksheet.set_column(idx, idx, max_len)  # set column width
writer.save()

Question 3

Estoy publicando esto porque acabo de encontrarme con el mismo problema y descubrí que la documentación oficial de Xlsxwriter y pandas todavía tiene esta funcionalidad listada como no compatible. Hackeé una solución que resolvió el problema que estaba teniendo. Básicamente, solo itero a través de cada columna y uso worksheet.set_column para establecer el ancho de la columna == la longitud máxima del contenido de esa columna.

Sin embargo, una nota importante. Esta solución no se ajusta a los encabezados de columna, simplemente a los valores de columna. Sin embargo, debería ser un cambio fácil si necesita ajustar los encabezados. Espero que esto ayude a alguien :)

import pandas as pd
import sqlalchemy as sa
import urllib


read_server = 'serverName'
read_database = 'databaseName'

read_params = urllib.quote_plus("DRIVER={SQL Server};SERVER="+read_server+";DATABASE="+read_database+";TRUSTED_CONNECTION=Yes")
read_engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % read_params)

#Output some SQL Server data into a dataframe
my_sql_query = """ SELECT * FROM dbo.my_table """
my_dataframe = pd.read_sql_query(my_sql_query,con=read_engine)

#Set destination directory to save excel.
xlsFilepath = r'H:\my_project' + "\\" + 'my_file_name.xlsx'
writer = pd.ExcelWriter(xlsFilepath, engine='xlsxwriter')

#Write excel to file using pandas to_excel
my_dataframe.to_excel(writer, startrow = 1, sheet_name='Sheet1', index=False)

#Indicate workbook and worksheet for formatting
workbook = writer.book
worksheet = writer.sheets['Sheet1']

#Iterate through each column and set the width == the max length in that column. A padding length of 2 is also added.
for i, col in enumerate(my_dataframe.columns):
    # find length of column i
    column_len = my_dataframe[col].astype(str).str.len().max()
    # Setting the length if the column header is larger
    # than the max column value length
    column_len = max(column_len, len(col)) + 2
    # set the column length
    worksheet.set_column(i, i, column_len)
writer.save()

Question 4

Probablemente no haya una forma automática de hacerlo en este momento, pero a medida que usa openpyxl, la siguiente línea (adaptada de otra respuesta del usuario Bufke sobre cómo hacerlo manualmente ) le permite especificar un valor sano (en anchos de caracteres):

writer.sheets['Summary'].column_dimensions['A'].width = 15

Question 5

Hay un buen paquete que comencé a usar recientemente llamado StyleFrame.

obtiene DataFrame y le permite diseñarlo muy fácilmente ...

de forma predeterminada, el ancho de las columnas se ajusta automáticamente.

por ejemplo:

from StyleFrame import StyleFrame
import pandas as pd

df = pd.DataFrame({'aaaaaaaaaaa': [1, 2, 3], 
                   'bbbbbbbbb': [1, 1, 1],
                   'ccccccccccc': [2, 3, 4]})
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(excel_writer=excel_writer, row_to_add_filters=0,
            columns_and_rows_to_freeze='B2')
excel_writer.save()

también puede cambiar el ancho de las columnas:

sf.set_column_width(columns=['aaaaaaaaaaa', 'bbbbbbbbb'],
                    width=35.3)

ACTUALIZACIÓN 1

En la versión 1.4 best_fitse agregó el argumento StyleFrame.to_excel. Consulte la documentación .

ACTUALIZACIÓN 2

Aquí hay una muestra de código que funciona para StyleFrame 3.xx

from styleframe import StyleFrame
import pandas as pd

columns = ['aaaaaaaaaaa', 'bbbbbbbbb', 'ccccccccccc', ]
df = pd.DataFrame(data={
        'aaaaaaaaaaa': [1, 2, 3, ],
        'bbbbbbbbb': [1, 1, 1, ],
        'ccccccccccc': [2, 3, 4, ],
    }, columns=columns,
)
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(
    excel_writer=excel_writer, 
    best_fit=columns,
    columns_and_rows_to_freeze='B2', 
    row_to_add_filters=0,
)
excel_writer.save()

Question 6

Al usar pandas y xlsxwriter, puede hacer su tarea, el siguiente código funcionará perfectamente en Python 3.x. Para obtener más detalles sobre cómo trabajar con XlsxWriter con pandas, este enlace puede ser útil https://xlsxwriter.readthedocs.io/working_with_pandas.html

import pandas as pd
writer = pd.ExcelWriter(excel_file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name="Summary")
workbook = writer.book
worksheet = writer.sheets["Summary"]
#set the column width as per your requirement
worksheet.set_column('A:A', 25)
writer.save()

Question 7

Ajustar dinámicamente todas las longitudes de las columnas

writer = pd.ExcelWriter('/path/to/output/file.xlsx') 
df.to_excel(writer, sheet_name='sheetName', index=False, na_rep='NaN')

for column in df:
    column_length = max(df[column].astype(str).map(len).max(), len(column))
    col_idx = df.columns.get_loc(column)
    writer.sheets['sheetName'].set_column(col_idx, col_idx, column_length)

Ajustar manualmente una columna usando el nombre de la columna

col_idx = df.columns.get_loc('columnName')
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

Ajustar manualmente una columna usando el índice de columna

writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

En caso de que alguno de los anteriores falle con

AttributeError: 'Worksheet' object has no attribute 'set_column'

asegúrese de instalar xlsxwriter:

pip install xlsxwriter

Question 8

Descubrí que era más útil ajustar la columna en función del encabezado de la columna en lugar del contenido de la columna.

Usando df.columns.values.tolist()genero una lista de los encabezados de columna y uso las longitudes de estos encabezados para determinar el ancho de las columnas.

Vea el código completo a continuación:

import pandas as pd
import xlsxwriter

writer = pd.ExcelWriter(filename, engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name=sheetname)

workbook = writer.book # Access the workbook
worksheet= writer.sheets[sheetname] # Access the Worksheet

header_list = df.columns.values.tolist() # Generate list of headers
for i in range(0, len(header_list)):
    worksheet.set_column(i, i, len(header_list[i])) # Set column widths based on len(header)

writer.save() # Save the excel file

Question 9

En el trabajo, siempre estoy escribiendo los marcos de datos en archivos de Excel. Entonces, en lugar de escribir el mismo código una y otra vez, he creado un módulo. Ahora solo lo importo y lo uso para escribir y formatear los archivos de Excel. Sin embargo, hay una desventaja: lleva mucho tiempo si el marco de datos es extra grande. Así que aquí está el código:

def result_to_excel(output_name, dataframes_list, sheet_names_list, output_dir):
    out_path = os.path.join(output_dir, output_name)
    writerReport = pd.ExcelWriter(out_path, engine='xlsxwriter',
                    datetime_format='yyyymmdd', date_format='yyyymmdd')
    workbook = writerReport.book
    # loop through the list of dataframes to save every dataframe into a new sheet in the excel file
    for i, dataframe in enumerate(dataframes_list):
        sheet_name = sheet_names_list[i]  # choose the sheet name from sheet_names_list
        dataframe.to_excel(writerReport, sheet_name=sheet_name, index=False, startrow=0)
        # Add a header format.
        format = workbook.add_format({
            'bold': True,
            'border': 1,
            'fg_color': '#0000FF',
            'font_color': 'white'})
        # Write the column headers with the defined format.
        worksheet = writerReport.sheets[sheet_name]
        for col_num, col_name in enumerate(dataframe.columns.values):
            worksheet.write(0, col_num, col_name, format)
        worksheet.autofilter(0, 0, 0, len(dataframe.columns) - 1)
        worksheet.freeze_panes(1, 0)
        # loop through the columns in the dataframe to get the width of the column
        for j, col in enumerate(dataframe.columns):
            max_width = max([len(str(s)) for s in dataframe[col].values] + [len(col) + 2])
            # define a max width to not get to wide column
            if max_width > 50:
                max_width = 50
            worksheet.set_column(j, j, max_width)
    writerReport.save()
    return output_dir + output_name

Question 10

Combinando las otras respuestas y comentarios y también apoyando múltiples índices:

def autosize_excel_columns(worksheet, df):
  autosize_excel_columns_df(worksheet, df.index.to_frame())
  autosize_excel_columns_df(worksheet, df, offset=df.index.nlevels)

def autosize_excel_columns_df(worksheet, df, offset=0):
  for idx, col in enumerate(df):
    series = df[col]
    max_len = max((
      series.astype(str).map(len).max(),
      len(str(series.name))
    )) + 1
    worksheet.set_column(idx+offset, idx+offset, max_len)

sheetname=...
df.to_excel(writer, sheet_name=sheetname, freeze_panes=(df.columns.nlevels, df.index.nlevels))
worksheet = writer.sheets[sheetname]
autosize_excel_columns(worksheet, df)
writer.save()

Question 11

import re
import openpyxl
..
for col in _ws.columns:
    max_lenght = 0
    print(col[0])
    col_name = re.findall('\w\d', str(col[0]))
    col_name = col_name[0]
    col_name = re.findall('\w', str(col_name))[0]
    print(col_name)
    for cell in col:
        try:
            if len(str(cell.value)) > max_lenght:
                max_lenght = len(cell.value)
        except:
            pass
    adjusted_width = (max_lenght+2)
    _ws.column_dimensions[col_name].width = adjusted_width

Question 12

La solución más sencilla es especificar el ancho de la columna en el método set_column.

    for worksheet in writer.sheets.values():
        worksheet.set_column(0,last_column_value, required_width_constant)

Question 13

def auto_width_columns(df, sheetname):
    workbook = writer.book  
    worksheet= writer.sheets[sheetname] 

    for i, col in enumerate(df.columns):
        column_len = max(df[col].astype(str).str.len().max(), len(col) + 2)
        worksheet.set_column(i, i, column_len)

Question 14

Sí, hay algo que puede hacer después del hecho en el archivo xlsx para ajustar el ancho de las columnas. Utilice xlwings para ajustar automáticamente las columnas. Es una solución bastante simple, vea las seis últimas líneas del código de ejemplo. La ventaja de este procedimiento es que no tiene que preocuparse por el tamaño de fuente, el tipo de fuente o cualquier otra cosa. Requisito: instalación de Excel.

import pandas as pd
import xlwings as xw

report_file = "test.xlsx"

df1 = pd.DataFrame([
    ('this is a long term1', 1, 1, 3),
    ('this is a long term2', 1, 2, 5),
    ('this is a long term3', 1, 1, 6),
    ('this is a long term2', 1, 1, 9),
    ], columns=['term', 'aaaa', 'bbbbbbb', "cccccccccccccccccccccccccccccccccccccccccccccc"])

writer = pd.ExcelWriter(report_file, engine="xlsxwriter")
df1.to_excel(writer, sheet_name="Sheet1", index=False)

workbook = writer.book
worksheet1 = writer.sheets["Sheet1"]
num_format = workbook.add_format({"num_format": '#,##0.00'})

worksheet1.set_column("B:D", cell_format=num_format)
writer.save()

# Autofit all columns with xlwings.
app = xw.App(visible=False)
wb = xw.Book(report_file)

for ws in wb.sheets:
    ws.autofit(axis="columns")

wb.save(report_file)
app.quit()